OpenAI's new image generator changes everything
Legible text, realistic rendering, and amazing style transfer mark the start of a new era in generative AI
AI image generators have already been impressive for quite a while, and iterative improvements in things like Stable Diffusion have added up to a fairly capable creative tool, but one with very clear, hard to escape limitations. This week, OpenAI launched a new image generation tool that’s built into its ChatGPT 4o model, which leverages technology developed for its Sora video generation tool. It’s a little bit slower than the DALL-E model it replaces – but in exchange for a slightly longer wait, you get vastly improved generation capabilities that seem to avoid a lot of the downsides of AI-based image generation that previously seemed unavoidable.
Social media was flooded with renders created with OpenAI’s new image generation tool after its official launch on Tuesday, and the majority of them were the result of so-called ‘style transfer’ – applying a specific aesthetic or set of associated visual characteristics to an existing image. During its launch stream for the new image generation features, OpenAI’s own staff, including Sam Altman, demonstrated a style transfer example live, using a ‘Studio Ghibli’-inspired aesthetic to transform a selfie shot during the presentation.
Studio Ghibli is the iconic Japanese animation studio behind classics including Spirited Away, Kiki’s Delivery Service, My Neighbor Totoro and Princess Mononoke to name just a few. Its most iconic director and co-founder is Hayao Miyazaki, whose thoughts on artificial intelligence and its use in creative endeavors went viral about a year ago, but are circulating again now given the irony that the signature look and feel of his studio’s visuals are now the hallmark of a new era of capability in AI image generation. In case you haven’t seen, Miyazaki is very much not generally in favor of using AI to generate creative works.
As a photographer and artist myself, and someone with a lot of close personal friends who are also working artists, illustrators and photographers, I find my reaction to products that can potentially replace huge swaths of their livelihoods viscerally repulsive. I’m also obviously, at least in part, still a professional writer by trade. Leaps in performance and capability like the one made by OpenAI this week with 4o image generation are a shocking, painful and existentially daunting occurrence which no amount of intellectualizing or silver-lining thinking can entirely ablate.
Conversely, there’s an essential type of joy that comes through when people are using these tools, especially when they themselves don’t possess the kind of artistic talent that would let them produce anything approaching this level of quality on their own. Exceptional talent and skill is obviously a core human trait and one to be celebrated – but so, too, is democratization of domains and experiences that once had a high threshold to entry, but that are made more accessible through technological advances.
The fact is that with its new image generation tools, OpenAI has crossed a threshold whereby AI-powered visual asset generation can now do, in minutes, a significant part of what it used to take a professional hours to accomplish working with expensive dedicated software like Illustrator, Canva and Photoshop. Past improvements have been more iterative, providing slow but steady improvements but keeping the event horizon of when the tooling could replace significant portions of the creative workflow pretty far out.
OpenAI’s 4o image generator will be transformative not so much because it replaces the need for a creative professional and attendant software in so many contexts, but because it will actually end up adding an entire new layer of abstraction on top of that process. Just like the modern visual operating system simplified the concepts of interacting with a computer vs. text-based system UI configurations like MS-DOS, so, too, will OpenAI’s current and subsequent iterations of visual generation tools radically change how we think about where that sits in the stack of modern user-facing applications.
'Just like the modern visual operating system simplified the concepts of interacting with a computer vs. text-based system UI configurations like MS-DOS, so, too, will OpenAI’s current and subsequent iterations of visual generation tools radically change how we think about where that sits in the stack of modern user-facing applications.' wow.
What are your thoughts on attributing the art styles for which institutions like Studio Ghibli have developed over decades and even have IP protection?
The cat's out of the bag now; it's unlikely that OpenAI would "un-train" the 4o model in case they're sued by the artists whose art styles they've borrowed (mostly without the artists knowing I reckon).