If Only You Could See What I´ve Seen With Your Eyes
Plus: Generative AI-Music is here / ChatGPT on an Apple II / ChatGPT Hacker Simulator / Neural Scene Chronology of 5pointz / Doraemon manga from 1970s predicted generative AI and much, much more
3D-scenes from eye reflections
In Seeing the World through Your Eyes researchers reconstructed 3D-scenes from the reflections in your eyes. The results are not very good yet, especially their test in the wild pretty much only reveals a black cube or whatever.
I compiled their examples into one video, followed from one in-the-wild-test with Miley Cirus:
Snip from the abstract:
Our method jointly refines the cornea poses, the radiance field depicting the scene, and the observer's eye iris texture. We further propose a simple regularization prior on the iris texture pattern to improve reconstruction quality. Through various experiments on synthetic and real-world captures featuring people with varied eye colors, we demonstrate the feasibility of our approach to recover 3D scenes using eye reflections.
The tech surely is not there yet, and this reminds me of the 2014 paper The Visual Microphone where they extracted identifyable audio from the video of vibrations on the surface of a bag of potatoe chips, or the age old zoom and enhance meme ofcourse, and i’m very convinced that stuff like the potatoe-chip-tech is used in intelligence circles for years now — why should you get a microphone near a target when you can deploy hires-videocams from far away? So, i consider this stuff to be of very high interest to a bunch of people.
Also, this AI-enhanced Eye-NeRF-pipeline clearly is an early experiment, and we are far away from real world applications or headlines like “Dead mans’ eye reflection used in murder trial“ or whatever, but the tech may soon be used to refine Apples’ AR-applications, for instance.
True Generative AI-Music is here
Meta released their open source music model MusicGen (code on Github) and while we’ve seen a lot of talk about AI-Drake and AI-Oasis, this was never true generative music like OpenAIs Jukebox or the infinite neural streams from the Dadabots and more akin to Deepfakes, where people just edited their voice with AI. This is also true for that new Beatles song, in which they used AI-tech to clean up bad audio to turn a well known Lennon-demo into a final fab four song. (There already exists an AI-made version of that song, and we’ll see how it holds up agaist McCartneys.)
All of this is awesome (minus the shitty tunes by AI-Drake ofcourse), but this stuff is not generative AI.
MusicGen is and as far as i can hear, it’s lightyears ahead of Jukebox and better than Google MusicLM. Here’s an example of a typical Jukebox-output which very fast detoriates into noise and compression artifacts.
In comparison, here’s what MusicGen can do: a cover version of A-ha’s Take On Me in a stable song structure with three minutes runtime in good quality. Sometimes the song structure isn’t stable and it leads to fun outcomes, like this AI-try at Metallicas Master of Puppets.
My own experiments with MusicGen only yielded subpar but sometimes interesting results. I used the Huggingspace playground and uploaded snippets of those tracks to condition the model on a melody, which worked so-so most of the time and went very weird on The Trashmen. Here’s a bunch:
Flock of Seagulls - I ran (pop punk AI-version)
The Trashmen - Surfin’ Bird (punk AI-version)
David Bowie - Heroes (minimal techno AI-version)
While that Huggingspace-demo only generates fifteen seconds of audio, there are Colabs which produce more than that (I didn’t test those, though). There’s also finetuning for MusicGen already, like specialized image synthesis models that give you anime, you can train MusicGen on any artist or label or genre you like, and you can split audio tracks from your music collection directly to MusicGen in this Huggingspace.
You see where we are heading with AI-music: Very soon, you’ll input all Radiohead-music to a generative AI-model and have Thom Yorke playing Metallica songs all day long in the style of OK Computer. Also: vocals gonna be the mutant fingers of generative music, or the typography.
I already wrote a long piece about how this essential property of AI-music, the interpolation of form and style, goes against the very foundations of copyright systems, and new business models around this principle are only at their embryonic stage at this point, with AI-Grimes pointing at a future in which Spotify may provide band-radiostations which play tunes “in the style of“ all day long, where said bands get microdeals based on their contributions to datasets.
Whatever those business models will look like — this is the moment the music industry feared since ChatGPT brought generative AI into the public eye. Don’t forget that these results are very much like Dall-E 1, and are not really usable for music production as is. But these results will improve, fast, and while true good songs might be far off, i can absolutely see this stuff taking over some genres very soon, first and foremost dancemusic ofcourse, where human performers always were something like an add-on. Just look at Daft Punk, the most successful dance act of all time, a semi-anonymous duo played by robots, while the main act in their shows always was the stage itself. Dance and Techo always fooled around with anonymity, and the Algo Raves of the early aughts added some dev edge to the beats. Making the Deejay an Artificial Intelligence feels like a perfect fit.
Here’s Nao Tokui showing you how that sounded and looked like one year ago. He’s working on a 4-track mixer for four generative AI-music-models and this is just the beginning. AI music gonna be wild.
Links
Google is reconstructing indoor spaces with NeRF. Indoor spaces on street view are long established, i expect those to improve and become true 3D spaces soon.
In a new paper researchers deploy novel prompt injection techniques on 36 LLM-apps and hack 31 of those, including Notion: “HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users.“
Rebuff, “a self-hardening prompt injection detector“. First one of its kind afaik, but surely not the last.
I find it interesting that pieces on AI-privacy such as this one go on about the usual privacy stuff (when you use ChatGPT you can hand over data to OpenAI and so forth), but fail to mention the already happening invasion of privacy when ChatGPT hallucinates lies about me. I already wrote about the incoming defamation lawsuits and now OpenAI faces another defamation suit after ChatGPT completely fabricated a lawsuit.
With ChatGPT, you’ll hack yourself: Use ChatGPT to spread malicious packages: Hackers identify non-existent hallucinated software components in coding suggestions, then upload malware disguised as those components to Github or similar platforms, then the users of ChatGPT simply hack themselves.
And while we’re at it, here’s the ChatGPT Hacker Simulator: Social Engineering.
In Neural Scene Chronology, researchers present a technique to “reconstruct a time-varying 3D model from internet photos“, which is to say: Take many images of landmark locations (Eifel tower and whatnot) and construct a 3D model with a timeline.
I love that they used the 5pointz for this, a legendary graffiti space in New York. I remember a website that provided exactly what that paper did: A timeline with hundreds of images showing the changing fassade of the building over time. There also was an exhibit of macro photography of the many layers of spray paint, symbolizing the years and years of graffiti that crossed and overpainted each other. It’s pretty cool to find these layerings of graffiti in an AI-tech-paper.Render-A-Video is basically style transfer for video based on Stable Diffusion and issues like the flickering in Stable Diffusion animation seems to be a thing of the past. With open source stuff like this we can expect all kinds of movie remixes within a year or so. Imagine E.T. the anime, or finally the Akira real life movie as frame by frame remixes done with sophisticated style transfer. Fan edits of movies are gonna have a fun time with this.
An open source EmoBot in Discord — “it's f'ing ridiculous“. Here’s the thing on Github.
The designers from Keingarten create Neural Posters, which look like a GAN trained on typography and poster design, and then they move through latent space. Reminds me of the experimental typo stuff from the ninetees. They sell them as NFTs on Opensea if that’s your thing. (via
)
If famous musicians were the opposite gender. The female David Bowie is a bit pointless but the male Janis Joplin is just cool. Nice idea.
Fun with Twitters image framing and generative AI.
B3ta outpaints famous artworks without AI. The one above is pretty great if you know the context.
Chances are you’ve seen these Stable Diffusion-QR-Codes at this point. Here’s a control net finetuned on them.
Martin Haerlin is a random guy in AI with a stick:
Bloomberg-piece on how generative AI not just reproduces human biases, but amplifies them. Twitter thread. I like this visualization in which they averaged the generated synthetic faces for certain occupations:
Daniel Dennett writes in the Atlantic on The Problem With Counterfeit People. I’m with him on this one, but have to say that his argument is not very conclusive and how exactly “counterfeit people“ are “the most dangerous artifact in human history“ that will take away our freedom, is not really laid out. Here’s a reply by Tim Sommers on 3 Quarks Daily.
I’m with him because, as i said repeatedly, i suspect the real risks posed by AI are not existential risks of extinction, but risks of psychological influence posed by synthetic humans which are prone to hacking techniques. If this tech keeps advancing at that speed, we’ll have plausible digital twins of humans acting on their behalf in the public, and we should think about what these replicas can and can’t do, asap.As a born and raised hessian, i for one welcome this bembel overlord called 'fortytwo'. Comes in stylistic appropriate Äpplewoi-green.
This piece on Venturebeat about model collapse from synthetic data made the rounds recently. Researchers found that “mistakes in generated data compound and ultimately force models that learn from generated data to misperceive reality even further“. Please note that, at least for computer vision, researchers found synthetic data to improve models.
Related: “We estimate that 33-46% of crowd workers on MTurk used large language models (LLMs) in a text production task“. The next foundational models will be trained on a hybrid synthetic/human corpus, with the synthetic data being edited and embedded in human contexts. Good luck watermarking or identifiying that.
Fordism Comes to the Gallery—and AI Comes for the Artists: “through the use of DALL-E as an image production assembly line, the relationship between artist and image is deconstructed and used as raw material—like parts in a Ford assembly plan — for the manufacture of supposedly new images. Yet the resulting images, forever dependent on the past (...) are actually old, trapping the viewer in a time loop of kitsch, presented as brilliantly new.“
"The ‘Safeguarding the Future’ course at MIT tasked non-scientist students with investigating whether LLM chatbots could be prompted to assist non-experts in causing a pandemic. In 1 hour, the chatbots:
- suggested 4 potential pandemic pathogens
- explained how they can be generated from synthetic DNA using reverse genetics
- supplied the names of DNA synthesis companies unlikely to screen orders
- identified detailed protocols and how to troubleshoot them
- recommended that anyone lacking the skills to perform reverse genetics engage a core facility or contract research organization"
I've seen things...
...seen things you little people wouldn't believe.
Attack ships on fire off the shoulder of Orion bright as magnesium...
I rode on the back decks of a blinker and watched C-beams glitter in the dark near the Tannhäuser Gate.
All those moments... they'll be gone...