I wrote a piece for germany’s largest independent tech-magazine from the renowned heise publishing house, and they kindly translated the piece for english speaking readers. It’s about how the interpolatable nature of latent spaces are incompatible with copyright as we know it and the advent of AI-tech therefore requires a radical shift in our thinking about intellectual property.
Find below the edited piece in full (with permission), intertwined with some Stable Diffusions for variations of the prompt “The Velvet Underground and Nico playing live on stage at a night club“.
All Tomorrow’s Parties
AI Synthesis – The End of Copyright as we knew it
AI systems for image and sound synthesis are stochastic libraries capable of interpolation. They require a radical reorientation of copyright law.
In the age of machine learning, our notions of intellectual property and copyright are facing radical upheaval. Upcoming lawsuits against companies offering generative AI systems raise questions about what art and creativity exactly are, and why and in which ways we protect and promote them.
In mid-January 2023, stock photo provider Getty Images initiated legal action against Stability AI in the UK, and following in early February, in the US too. Previously, three artists had filed a lawsuit accusing the company of violating their copyrights with Stable Diffusion. In initial reactions to ChatGPT, publishers are calling for an extension of ancillary copyright to generative AI systems.
Collecting societies which manage the copyrights of their members, such as GEMA or VG Wort, or the US-equivalent BMI, face a daunting task. Through these novel AI-tools, their distribution mechanisms are becoming a potential pawn for fraudsters and hacks, who are able to deceive these systems with easy-to-use software and boost monetary distributions in their own favor: with AI-generated content capable of blowing up existing systems – through the automated media synthesis of plausible but inauthentic texts, images and audio data, purely generated to pump up the volume.
Stochastic box of chocolates
In a piece for the Wall Street Journal, Alison Gopnik, a professor of psychology and philosophy at Berkeley, described these new generative AI models as ‘library-like cultural technologies that provide access to and multiply knowledge’. The comparison is appropriate, if lacking, and I’d describe these interpolatable data spaces computed by algorithms called latent spaces accordingly as "stochastic libraries": a library where you describe what book you want to a robotic librarian, and it picks out an approximate match. Put another way, "AI is like a box of chocolates – you never know what you're gonna get."
Stochastic libraries are interpolatable databases of their training data: AI systems learn various characteristics of the input through pattern recognition and store them as so-called weights, which can be controlled via parameters. In Stable Diffusion, there are 870 million parameters; in ChatGPT, there are 175 billion of those. If you create an AI model for paintings by Pablo Picasso for instance, the neural network stores the recognized patterns in its training data for stylistics in brushstrokes, coloring or proportions.
In turn, I can control these via text prompt, so if you want to create an image in the style of the master in this Picasso AI, you activate the parameters for "Vase", "Flowers", "Fruit" and "Picasso" and the model creates a still life based on the weights of these patterns in its database. The same thing happens in ChatGPT when I remix a Heise IT text in the style of a Ramones song — and it is precisely this principle of remix on a molecular, interpolative level that creates an explosive force for existing copyright-systems.
Interpolative nature of AI models
By the very nature of prompt input, which decomposes its input into various tokens — syllables and groups of letters — many of these weights and parameters are part of any output of generative AI. This is one of the reasons why the layers in the Stable Diffusion Lawsuit refer to them as "collage tools of the 21st century." This choice of words, however, obscures the view on the interpolative character of these models: Each image is generated based on many different parameters, which were previously obtained in AI training from millions of image analyses.
Each synthetic image, AI music or generative text are always the result of a multidimensional interpolation within latent space, in which one generates, say, a five-dimensional space from the parameters "robot", "dog", "meadow", "Picasso" and "flowers", which is full of possible image syntheses and from which final syntheses are selected at random (in diffusion models) or according to a reward function. Through text prompts I am able to combine any pattern contained in the database with countless numbers of any other pattern to create novel remixes — and so our AI Picasso is suddenly painting robots and spaceships as the Wetware Pablo never did in real life.
This ability to interpolate between data points poses unprecedented problems, and not just for copyright law: Synthetically generated AI voices are already causing protests among voice actors, who found new clauses in their contracts demanding rights to use their voice recordings to train synthetic copies. Unions advise against signing such contracts, but it's only a matter of time before movie producers can create any voice imaginable in any tonality, purely by interpolating between the individual patterns in the data set. The new villain of the Marvel universe is supposed to sound like Ted Brolin, but with the voice coloration of Bruce Willis and the rhythm of Pee Wee Herman? There’s an AI for that.
The training data of generative artificial intelligences, which often contain copyrighted works, are thus converted into parameter banks for "new" synthetic outputs. Well-known science fiction author Ted Chiang, whose short story "The Story of Your Life" was turned into a movie in Denis Villeneuve's film "Arrival," compared Large Language Models to the lossy data compression of JPGs – a metaphor that seems entirely appropriate considering the dissolution of existing culture in an atomized grey goo of latent space.
Quo vadis, copyright?
The inherent randomness of a stochastic library and the interpolative nature of AI synthesis fundamentally contradicts the principles of U.S. and European copyright laws, which require individual, identifiable works by natural persons and a certain level of creativity to take hold. How such copyrights can or should respond to an interpolatable latent space in which I can freely combine patterns of existing works at a creative molecular level is entirely unclear, and complaints will for now be decided, as a lawyer would say, "on a case by case basis."
However, two studies have strongly suggested that diffusion models are capable of exactly replicating the image data used to train them (arXiv preprints: "Investigating Data Replication in Diffusion Models" and "Extracting Training Data from Diffusion Models"), which on the one hand allows copyright infringement with these models and can lead to privacy violations, too.
Complicating matters further is the commercial application of these AI systems. It is true that they were created in a scientific framework and can therefore rely on exceptions in intellectual property rights in Europe and the USA, at least during their development. However, these exceptions are subject to higher legal standards for commercial applications, and Stability AI, OpenAI and Microsoft already built and marketed multiple commercial applications based on these AI systems. This is one reason why the Federal Trade Commission is investigating OpenAI for neglecting due diligence during the launch of ChatGPT.
Endless mash-ups of atomized culture
Copyright holders' collecting societies so far have no working strategies to approach these endless stochastic AI-mashups based on atomized culture. Even if creators and rights managers find ways to regulate the stochastic nature of these novel cultural synthesizers in future copyright reforms, black markets will emerge for models that allow users to freely explore the new synthetic worlds. In fact, they already exist and there are hundreds of checkpoints (CKPTs) for Stable Diffusion, derivative AI models that have been trained on the works of specific artists and the aesthetics of whole subcultures. There even is a Stable Diffusion model for the movie "Cats".
Already today it is possible to build your own image generator based on Stable Diffusion and various open source tools, in which you can mix new image worlds with different checkpoint files like ingredients in cooking: "Specialized CKPT with Cats, Star Trek and Ghibli please", and out comes a gigantic latent space featuring anime cats from the planet Vulcan, guaranteeing infinite visual worlds. Thinking further into the future, brain-computer interfaces come into sight, allowing real-time visualization of thoughts – digitally enabled lucid dreams while awake. The dystopian outlook of Disney trying to control thoughts, at least in their visualized output, is not far off: "I can't show that, Dave."
Problems in exploiting copyrights are increasing
These problems posed by generative AI systems likely intensify soon. In January 2023 alone, a dozen new technologies for generative music were unveiled, exemplified here by Google's MusicLM – AI systems for endless musical tapestries "in the style of" are within reach. Recently, a synthetic AI-song by synthetic, unauthorized AI-versions of rappers Drake and The Weeknd went viral on TikTok, and a new album by a deepfaked AI-Oasis produced a more authentic Oasis sound than the late original band's records.
A few days ago, singer Grimes opensourced her voice for use in AI synthesizers and is asking for 50 percent royalties in return if the songs are successful, while CEO Tom Graham of the startup Metaphysics, which specializes in Deepfakes, has applied for a copyright on the AI look of his appearance. A telltale sign to a future of copyright where artists may hold shares in artworks and cultural products for which they contribute data from personality patterns, such as voice color, brush stroke, word choice, and so on.
AI models for video and 3D in the coming decades will enable the creation of latent spaces akin to movies explorable in realtime, where prompt engineering gives you approximate control over the output, giving the user customization control over details like lead actors, matte paintings, color grading or costuming.
Mutating Matrix, unregulable in the long run
Once a fixed, unique, identifiable work, The Matrix turns into a stochastically explorable space in which one is able to fly with Neo through a mashup of "Metropolis" and "The Neverending Story" on the back of lucky dragon Falkor: novel amusement parks as DLC (Downloadable Content) for the new gaming system Latent Space, featuring infinite music streams performed by virtual musicians.
The very first collisions of law with the principles of the Digital mainly concerned the rights of privacy and authorship. AI syntheses in stochastic libraries are the latest, and admittedly: gigantic progress of this development. The dissolution of human cultural work in a digitally computed, multidimensional latent space means another, almost insurmountable legislative task.
If one follows the old principles of clearly identifiable works by natural persons, the cultural space will become unregulable eventually. A further extension of copyright law, however — to e.g. stylistic features as in trademark law — runs risk of limiting creative human expression.
Outlook: Shutterstock's monetization approach
In the present, recent monetization approaches presented by Shutterstock show a viable path to the future: With micropayments per synthesized image, artists and photographers are paid for their contributions to a training dataset. Whether this can compensate for the loss of income from creative work and royalties remains to be seen — after all, for years already the value of creative work was subjected to enormous economic pressures by the emergence of globalized design platforms and Image Synthesis further devalues visual creative labor. Also conceivable as a viable compensation strategy are regular payments equivalent to the GEMA copying levies on CD burners, copy machines or smartphones, an automatic tax on the use of creative work paid by providers of generative AI-services. If you pay twenty bucks per month for Midjourney, five go into a bucket for artists, distributed by collection societies.
Whether it’s distribution procedures based on pro rata contributions to training data; or novel, pragmatic mergers of personality rights and copyrights, as in the case of the release for AI use of Grimes' voice: Legislators, collecting societies and, not least, artists and creators face a rocky road to keep creative work competitive in the future. Their contribution to the social good is undisputed even in times of stochastically interpolatable indexing of culture – in a world where a virtual Kurt Cobain performs Nirvana songs in the style of vaporwave tracks in my living room – "for all tomorrow's parties".