This article is about video so let’s start with a video. A video is worth 24,000 words per second. Look at this (just look at it!) –
This video was generated by AI guided by a loving human hand. It skips plot or characterisation, instead being absolutely gorgeous. It’s a feast for the eyes, isn’t it? Regardless of whether my philosophical arguments in this essay are true, false, or seditious, the video’s delicious and eyeluscious. That’s the only thing I came here to say.
Videos like that have been bubbling up since October of 2024; with no great hunting I found 50 similar channels. What flow of technical progress has brought us to this point? What cultural waters merged with tech to form bubbling rapids?
Video-generation is just the right technology for its time, because in the beginning was the web. It was a way of exchanging text until the 18th of July 1992, when someone uploaded a picture of a girl-rock band: the first picture on the web. The next year, 1993, the first video was streamed over the internet (It was streamed at a framerate of 2 frames-per-second because that was all that was possible.)
Bandwidth broadened and video encoding smallened. Both Google Video and Youtube found their berth in this broad stream at the same time in the first 45 days of 2005. Video gobbled up the attention we used to give to text. (“Television kills telephony in brothers’ broil. Our eyes demand their turn.”) Video ate the internet. The shift from reading-the-web to watching-the-web is something I remember quite well. (That is not a good thing: I would prefer if ye were all reading books, but people don’t listen to me about such things.)
Something else was going on in the meantime. Culture changed in such a way that stories with morals lost credibility, and vibes and æsthetics swelled into the vacuum. When I was a child, I had a book about Robin Hood printed in England in maybe the 1940s. The heroes were brave, noble, handsome, unwavering, and tall, and the villains warty and swarthy and cowardly and mean (even foreign). Anglican moral clarity was the pillar of the genre. Panavision screens were filled with tales of good & evil, queen & country.
That all crumbled in the modernist era (1901-1945). Britain’s control crumbled and the globe and the arts swirled with a mist of hostile Celtic relativism. James Joyce and Flann O’Brien viewed claims to objective truth as an arm of civilised oppression (they even put pages of their books towards attacking Euclid – the fellah was a bit too sure of himself.) In philosophy, existentialists and nihilists told their readers to confront what they liked to call ‘the absurd’. Camus & company emphasised that we cannot escape the absurd. We must enjoy human life with its absurdity, rather than trying to make it make sense.
Hold that thought. Let’s pause for another video –
How does that video relate to the rational and the absurd? Oscar Wilde said, “the telling of beautiful untrue things is the proper aim of Art.” The AI-generated videos mushrooming up on Youtube these past four months care about beauty and nothing else.
After Camus came the internet age. Thousands of different idiotlogies shouting over each other. People telling outright lies and hiding behind anonymity. Powerful and organised spies pulling strings and adding malicious chaos on top of the random kind. Ah god the rational never had a chance did it?
I’m not trying to say that we lost track of truth and must get it back. I am saying that we lost track of truth and stopped caring.
The next landmark I gesture at on our tour is in March 2020. The heroes of that moment, probably teenagers, launched Æsthetics wiki. They appointed themselves entomologists of vibes. It was the natural thing to do in the post-truth era. Young people today turn their attention first and foremost to the æsthetic – that is more important than the message or the moral. The post-truth world is the playground of the absurd.
Those were the conditions: a video-first internet wherein nobody cares about anything except appearances. Then, four months ago, neural nets that can generate video are born. They were dream-machines born into a dream looking for goregups entertainment. Just look at this –
Some channels are what I call ‘vibes reels’, devoted to devotchas and dreamscapes, just floating along. Nothing else happens and nothing else needs to. Steampunk vibes are very popular. (Aside: I am forced to conclude that goggles replaced handheld phones in Steampunk Universe. The ubiquity of phones here is the ibiquity of goggles there. No other explanation for why so many goggles.) 1950s Americana and the look of 1950s American sci-fi are very popular. Those two are far too popular really, so the ’50th century Steampunk’ that splashed us into the article is commendable. The channel ‘Cyborg Nation 3026‘ also is admirable in exploring more unique æsthetics, but its quality is lower.
The second genus is Fake Trailers. Fake trailers re-imagine pop culture films and computer games as (normally 1950s-style) trailers. ‘Abandoned Films‘ claims (correctly, by my little detectiving) to be the original trailerfaker. Then came Cyberithm, Warden Cinematics, and SPLITMIND FILMS.
There are AI-generated music videos for AI-generated music. The ‘Gamesongs AI‘ channel describes that “All music was created with Udio Ai”. A deeper discussion of musicmaking neural gets a bit beyond the spectroscope of today’s lesson, but (remember this phrase:) just look at it!
The limitations of AI video (and it’s January 2025) are obvious: the characters can’t talk, and can’t complete real actions. Anything more complex than taking their bronze steampunk goggles on/off is glitchy. If a beautiful sky-pirate presses a button, her finger might not hit the button.
These limitations mean neural nets can’t currently make a video with a coherent plot. Is this really a limitation? One of the 10 highest-grossing films of 2024 was Beetlejuice Beetlejuice, and that doesn’t have a coherent plot either.
I am confident these problems will get better. Already we see Kling, one of the leading companies in the field, releasing neural nets that go from movies to talkies:
Let’s be still for a paragraph or two. This past year, I have had a grand old time playing with AI image-generators. I start with an idea in my mental imagination, and then the prompt goes through several versions, and Stable Diffusion has a dozen dials to twiddle. It takes 20 minutes to get all that right, and the game-of-chance begins: a few come out bad, most acceptable, and every so often there is something wonderful. It’s fun, and it feels good to hit the jackpot. The creation at the end impresses people.
I was never inclined to learn Photoshop and Krita, but with neural nets (free and open-source ones) I produce images that impress people, and have fun doing it. The tall towers where visual artists lived were brought down to my slum and I can make art.
Videos creators use either text-to-video (the video is made from a prompt) or image-to-video. I see five AI models mentioned in the video descriptions and comments:
- Luma
- Sora (made by OpenAI)
- Kling
- Hailuo by Minimax
- Runway Gen 3
Imagine Art Films released a non-fiction, discursive video January 9th comparing the models. They call Luma the worst. Kling and Sora are very good. Runway has high quality, but the videos lack motion (which is good for some cases, like an establishing-shot.)
Video generation is about three or four years behind images. But filmmaking (the unintelligent sort) was in much higher towers on the hills than image-making. Spending five million to make a feature film was considered frugal. Now we have Youtubers making video – with love – for a few thousand dollars, and it looks sumptuous my dear. Without these tools, the art never would have been made.
At the same time I am writing this, a wildfire is burning down Hollywood. Isn’t that interesting? Some people fret that AI-video-generation might end Hollywood. But maybe I want the end of Hollywood, which I view as an arm of civilised oppression. Below these short videos someone always comments: “I would love to see a feature-length version”. That shows an appetite for destruction.
The limitations of generative AI (like fingers or whatever) are fixable. Let’s talk about the limitations of Hollywood: 99% of people can never get together the resources to make a film. Massive budgets are required; only the stories of the rich can be told. When that much money is on the line, studios become risk-averse, preferring capeshit and remakes to being experimental. Of the top 40 most popular movies of 2024, 38 were from one country (2 from Japan): the country with the money.
Reduce the resources required to make a film ten-thousand-fold, and see different kinds of films: films made with less capital and more love. Entertainment can be more varied, more customised. Films from a million GPUs – films about space-squid, the Fenian Cycle, the psychogeography of Cricklewood – can create enough luminous intensity to bring down Hollywood. Generative AI excels at being weird; Hollywood was horribly normal. Hollywood used to excel at making things look polished, but they lost that edge four months ago.
“In Art, the public accept what has been, because they cannot alter it, not because they appreciate it.” Generative AI changes that. Video is the dominant form of entertainment and the public are now in control.
Let us know your thoughts! Sign up for a Mindplex account now, join our Telegram, or follow us on Twitter.