
Deva Darshan
OpenAI has emerged victorious over Elon Musk’s xAI in the final of a highly anticipated tournament designed to determine the best artificial intelligence (AI) chess player among general-purpose language models.
While chess has long been used to measure a machine’s capabilities, this tournament broke with tradition by excluding dedicated chess engines. Instead, it featured AI models built for everyday use, not specifically trained for the game.
OpenAI’s o3 model triumphed without a single defeat throughout the competition, ultimately besting xAI’s Grok 4 in the final. The result has intensified the ongoing rivalry between the two AI giants. Both Sam Altman, CEO of OpenAI, and Elon Musk, head of xAI and co-founder of OpenAI, have claimed their respective models are among the most intelligent in the world.
In a closely fought third-place match, Google’s Gemini defeated a different OpenAI model, securing a podium finish in the three-day tournament.
However, despite their wide-ranging abilities, these large language models are still refining their performance in chess. Grok, in particular, faltered during the final, making a series of mistakes—including repeatedly losing its queen—that cost it crucial games.
“Up until the semi-finals, it seemed like nothing would stop Grok 4 on its path to victory,” wrote Pedro Pinhata of Chess.com, which covered the event in depth.
“Despite a few shaky moments, xAI’s model appeared to be the strongest player in the tournament… but that illusion was shattered on the final day.”
Pinhata noted that Grok’s play in the final became “unrecognizable,” describing it as filled with costly blunders that allowed o3 to rack up a string of convincing wins.
“Grok made so many mistakes in these games, but OpenAI did not,” observed chess grandmaster Hikaru Nakamura, who streamed the final match and analyzed the performance live.
Ahead of the final, Musk attempted to downplay Grok’s previous victories, posting on X (formerly Twitter) that the model’s performance was merely a “side effect,” as it had “spent almost no effort on chess.”
Why Are AI Models Playing Chess?
The tournament was hosted on Kaggle, a Google-owned platform that enables data scientists to benchmark AI models through structured competitions.
Over the course of the event, eight major large language models (LLMs) from companies including Anthropic, Google, OpenAI, xAI, and Chinese developers DeepSeek and Moonshot AI faced off in a chess round-robin. These models were evaluated not as specialized chess programs, but for their strategic reasoning and adaptability.
In the world of AI, benchmarks are critical tools used to assess how models perform on tasks such as logic, code generation, and problem-solving. Chess—known for its complex rules and need for strategic foresight—offers a clear framework to test a model’s ability to learn and optimize outcomes.
Historically, chess and other strategy games have served as key indicators of AI progress. In the 1990s, IBM’s Deep Blue made headlines by defeating world chess champion Garry Kasparov, a moment seen as a milestone in human-computer competition.
Reflecting on the defeat two decades later, Kasparov famously remarked that “losing to a $10 million alarm clock did not make me feel any better,” likening the computer’s intelligence to that of a basic timekeeping device.
Another milestone came with AlphaGo, developed by Google’s DeepMind, which dominated the ancient Chinese game of Go, defeating top human players and pushing the boundaries of what AI could achieve in strategy games.
In 2019, after several losses to AlphaGo, legendary South Korean Go player Lee Se-dol retired, stating, “There is an entity that cannot be defeated.”
Notably, DeepMind co-founder Sir Demis Hassabis—who played a major role in AlphaGo’s development—is himself a former chess prodigy, further illustrating the deep connection between AI and strategic board games.