Can one of the architects of modern software development help unlock AI's true potential?
Chris Lattner created LLVM, and ushered in Apple's Swift era – now, he wants to make AI development accessible to all
Modular AI founder Chris Lattner has a storied career, with industry-changing stints at some of the biggest tech companies in the world, including at Apple, where he helped create the Swift programming language that the company uses across its platforms today. He also created LLVM, and both the Clang and MLIR compilers, so he may be the single-best global authority on low-level software tooling development working today.
Modular represents his first foray as a founder, however – he stood up the company along with Tim Davis, who he worked with while working at Google on its AI projects, including TensorFlow and Google Brain. But while Modular was formed just recently in 2022, Lattner tells me his interest in AI really ramped up nearly a decade ago, towards the end of his tenure at Apple.
“The short version of my story of how I got to Modular was that I built a career at Apple, went up through the developer tools organization at the end of the time, running the developer tools platform, including Swift and many other things. In 2016 I became really interested in AI,” Lattner said. That's when [Apple’s] Photos [app] had, you know – you could detect the cats in your images and things like this.”
Lattner points out that this is still where many people get their introduction to AI, despite the seeming prevalence of tools like OpenAI – in autodetection of beloved pets or family members in apps with a massive install base like Photos on the iPhone.
“The thing that I found fascinating about that, is that I didn't know how to code that. “And being a developer focused person, well, I don't even know how to express that – like, how do you do that? You can't write that in code, right? So I became really interested in this whole deep learning thing.”
This fascination is one I’ve heard expressed in different ways by different people from a technical, developer or engineering-oriented background over the past few years: There’s a fundamental difference between surfacing a result to an end-user of a software tool using rule-based, classical programming vs. having it come from the output of deep learning model, and helping to articulate, illuminate and guide that result ends up being catnip for many of those whose lives have been focused on stretching the boundaries of what’s possible working with deterministic code.
“At the time, let's just say Apple was not super interested in AI and didn't have its act together,” Lattner told me. “You can extrapolate to what they've been doing now – I don't know – but come 2016, that was not their focus, and so I decided to move on.”
After nearly 12 years at Apple, Lattner says he embarked on a sort of “hero’s journey” to try to better understand AI technology, what the state of the art was, and what potential it might hold. His first stop was at Tesla, where he worked on its Autopilot system, which provided a look into the difficulties involved in deploying applied AI in production. That involved using older technologies, including Caffe, as well as more modern tools like TensorFlow.
It was a short but intense stint at Tesla, but one that contained an entire education on the hurdles then facing putting AI out into the world in a way that does work for real people in their daily lives. After seeing the front lines, Lattner reflected on what other aspects of the AI deployment stack attracted his interest.
“When I moved on [from Tesla], I was trying to decide ‘Where do I want to go, how do I want to focus?,’ and I decided that the systems part of AI was really the thing that excited me,” Lattner said. That led him to Google, which at the time, was working on a new AI accelerator chip, the ‘tensor processing unit’ or TPU. He signed on to get it to work, to get its software functional, to launch it for cloud deployment, and basically to make it a product that Google could then sell. Google’s TPU would become “the first successful at-scale non-NVIDIA AI chip” under Lattner’s guidance.
While many watching the tech industry’s progress over the years would cite OpenAI’s debut of ChatGPT in 2022 as the watershed moment in modern AI’s development, under the hood, there was a ton of of transformative groundwork being laid – in all of which, Lattner was essentially elbow-deep. This vantage point only served to reinforce for him just how much work was required in order to accomplish anything truly groundbreaking in AI.
“What I saw is that Google was able to do amazing things, but it was only able to do amazing things because they had built all of the software, and they had built all of the things,” he explained. “And whenever you needed to do something new, it required all the experts to come in and then engineer something from scratch. Outside in the normal world of software engineering, you don't do that – you massively reuse and you build on top of other people's work.”
The amount of work required to introduce novel features at Google that Lattner witnessed was literally out of reach for the vast majority of companies, just because of cost constraints. While TensorFlow was built as a reusable AI infrastructure component, really, Lattner says, you had to fundamentally change it every time you wanted to get it to do anything new.
While the toolset around AI development has changed, with PyTorch essentially winning out over TensorFlow as the AI developer lingua franca, it’s still not something that you can easily use off-the-shelf when building out ambitious new products and use cases.
“If you look at how people are using PyTorch today, you have to know how all the internals of these things work and you have to go change it, which outside of Google and Facebook, nobody has that expertise,” Lattner said. “Who can hire a compiler engineer? Who can write CUDA kernels? It's possible, but it's very difficult to attract the talent, and it’s a very exclusive skillset.”
“This is why today only the biggest organizations in the world are really capable of pushing the state of the art – It's the OpenAIs or the Anthropics or the Googles and the Microsofts and these companies, like the top tier,” he adds. “But everybody should be using AI tech, and should be building it into their product. So there's this massive gap in terms of the way things should be and what's actually happening.”
One big reason for the huge discrepancy between expectations and reality when it comes to the state of generative AI is that the infrastructure behind is actually just really old, Lattner points out. Both PyTorch and Tensorflow started in 2015 – while ChatGPT itself is just two years old at this point. Even the term ‘generative AI’ only came into general parlance a little over a year-and-a-half ago.
Meanwhile, there’s a rush to bring new hardware to market to establish a viable alternative to NVIDIA for that part of the stack. “What’s happening is the world’s trying to bring in more and more chips – but it’s not about the hardware, it’s about the software, because what matters is, can developers use it? And none of these companies have CUDA – not even Google,” Lattner said.
“What I realized is that this is a software problem, this is a developer problem – and this is not an NVIDIA problem, or an AI chip of the day problem, it's an industry-wide problem,” he explained. “When I looked at this and said, ‘Who's working on solving this massive problem?’ – because it's not like there's going to be fewer chips in the future. When we looked around, there's nobody working on this. What you see is you see every hardware company building their own software stack, because they have to.”
Meta and Google and many of the other companies that are building the foundation layer of the AI stack are building their own chips and solving their own problems, Lattner points out, and while he believes that they really do want others to use their stuff, they aren’t specifically investing in that direction, nor can they build out robust support fo that given it’s not their primary focus. Nor is any of that built for these new chips coming online that are aiming to be a more broad-based replacement for NVIDIA GPUs.
Lattner and Davis saw an opportunity to capitalize on their work (Davis focused more on TensorFlow Lite and on-device model development and deployment at Google, so they have that experience, too) and to learn from past mistakes to take a bottoms-up, hard tech approach and build the “2.0 version” of all that came before – from TPUs, to TensorFlow, to PyTorch, to MIcrosoft ONYX and beyond.
“Let's build it in a production quality way,” Lattner explains regarding their approach at Modular. “All these things are built by research teams – that's not reliable. Let's actually tackle this like a software engineering project instead of a research project.”
That led to Modular’s flagship product, MAX, as well as the creation of a new Python-family language called Mojo that’s designed from the ground up to retain Python’s accessibility, but add in fancy new compilers and other elements that make it fit-for-purpose for low-level AI hardware programming – without requiring any messy incorporation of C++ or CUDA. MAX includes a graph compiler and runtime library that’s designed to be performant and efficient on current and future AI models and hardware, including both GPUs and CPUs. Together, they represent what Lattner and Modular believe will be a key unlock in helping actually scale generative AI development and deployment in a way that makes it accessible and sustainable to companies of all stages and sizes.
It’s a massive undertaking, and an ambitious goal – but one that, if Modular succeeds, could position them as a key, essential piece of just about every company’s tooling stack, given the trajectory of generative AI to date and its continued importance to not just tech companies, but any company that uses tech (which, of course, is all of them).
“The bet is also that it's going to continue to get more important,” Lattner said. “And it seems like it has to if we're going to get to the point where everybody wants to make it, where you can use it at every scale of enterprise, and in every device. You want to be able to have fundamental programmability we've had for years with the PC, and we need that for AI also.”