xAI's Grok 3 and jumping into the deep end
Elon Musk's latest model has dropped, and early indications are that it performs at or near the state-of-the-art. Plus, it goes deep
At this point in the AI arms race, there are a few new things that are considered table stakes – basically while they also remain the bleeding edge of the state of the art. Chain-of-thought is one, and the other is ‘deep research,’ which comes under different branding depending on which model maker you go to, but which amounts to an undergraduate level introductory topic-based survey paper about whatever you want to learn more about.
xAI’s Grok 3, which is available now for Premium+ members, has been top of mind for Elon Musk – as much as anything is able to occupy his attention for long these days, while he’s busy gutting the US government and now apparently also trying to take on ‘woke’ gaming with his own studio.
Rolling a few products into one, Grok 3 now offers a “DeepSearch” option for queries, which resembles both a search engine-style tool for sourcing information from the web like OpenAI’s search feature and Perplexity, as well as the more thorough ‘deep’ search modes that both offer. In practice, Grok 3’s implementation of this tool is much faster than those of the two competitors mentioned above, but it also seems like it produces a much more high-level, casual overview vs. the other tools (though that might be an artifact of the time it takes and the way it articulates its process and what it’s currently working on more than the actual result).
Grok also now has a reasoning mode called ‘think’ that checks its answers as it goes, but it doesn’t articulate its thoughts like some competitors – an intentional decision that Musk says is to prevent ‘distillation’ by competitors like DeepSeek. One of the prevailing theories about how DeepSeek was able to achieve its incredible cost/performance efficiencies was that it distilled (using a large, complex model’s answers to train a much smaller one with relatively little quality loss) from OpenAI’s most powerful models.
Musk claims that Grok 3 is the best AI chatbot in the world, but of course whether that’s accurate or not depends on who you ask. It is definitely receiving some favorable reviews, and performing well on benchmarks thus far. Andrej Karpathy, one of OpenAI’s founding team and a former director of AI at Tesla, has a good rundown of how he found it performed relative to the other state-of-the-art models available right now. Consensus seems to be that OpenAI’s best models still outperform the Musk-backed’s best efforts, however.
I’m less interested in how Grok 3 stacks up to the other AI chatbots out there, and more in what this release tells us about the direction of the industry and what matters (at least in the eyes of the AI foundation model companies) when it comes to product iteration.
It’s definitely clear by now that all these companies will need to offer the ability for their tools to ‘think,’ ‘search’ and ‘research.’ Consider this relative to the original launch state of GPT-3-powered ChatGPT, which for the purposes of this article we can call the starting point of this product category. The original ChatGPT could do none of these things in the way we understand them today.
There’s a desire and a tendency to see these new implementations of the various generative AI models as simply ‘repackaging’ efforts that simply obscure scaling limits, or that minimize and mask but don’t eliminate fundamental challenges in the tech, like hallucinations. These takes underestimate the real utility value of thinking, researching behavior equipped with real-time access to current information in order to confirm a pre-existing bias that because AI can’t directly replicate human thought and intuition, it’s therefore useless.
There’s no question in using these new incarnations of AI that they’re actually massively useful, and that they’re far less susceptible to the failings of earlier versions. Most importantly, these gains were realized only in part on the back of scaling the number of resources expended in their development and during their use; recombination of the elements involved, and the addition of new and innovative pieces to complement the existing recipes, have made all the difference. And will continue to – next year’s capability set is probably just as difficult to predict as this year’s baseline was at the outset of 2024, which is incredibly exciting.