DeepMind AI Solves 9 Math Problems, But Not All Complex Ones

Google DeepMind's new AI agent, dubbed AlphaProof Nexus, has reportedly autonomously generated formal proofs for 9 out of 353 open problems cataloged by Hungarian mathematician Paul Erdős. This development, detailed in a preprint paper, showcases the agent's ability to combine a Gemini 3.1 Pro model with a Lean compiler. The computational cost varied, with more complex problems demanding higher expenses, approximately $200 per solved task.

The AI’s successes are not trivial, offering novel arguments, such as one for problem #125 that hinges on the proximity of powers of three and four (3m ≈ 4k). This suggests a capacity beyond mere pattern recognition, venturing into abstract reasoning. The agent can also enlist AlphaProof, a DeepMind reinforcement learning system, for specific sub-tasks. Input to the agent consists of a Lean file containing a theorem with an empty placeholder for its proof.

While these findings highlight AI's burgeoning mathematical capabilities, the system's limitations are also evident. The same report notes that the AI can falter, proposing lemmas or referencing non-existent results, which the Lean compiler rejects. This inherent diagnostic function, weeding out erroneous paths, serves as a form of automated review.

Problems requiring the construction of substantial new theoretical frameworks remain largely beyond the agent's current reach. Furthermore, attempts to run simplified versions of the agent—without evolutionary algorithms, AlphaProof integration, or on smaller models—also achieved the same 9 task resolutions, raising questions about the necessity of its more elaborate architecture for these specific breakthroughs.

Earlier iterations of DeepMind’s mathematical AI, such as AlphaProof and AlphaGeometry 2, demonstrated success in specific domains. In July 2024, these systems reportedly solved four out of six problems at the International Mathematical Olympiad (IMO), earning a notional silver medal. Notably, they struggled with combinatorial problems. AlphaGeometry 2 independently resolved one geometry problem.

In a separate instance, an AI agent named Aletheia (also from Google DeepMind) is reported to have autonomously produced a mathematical paper, disproving a decade-old conjecture and identifying an error that had eluded cryptography experts. The extent of human involvement in this particular achievement is not fully detailed, though it is described as a multi-faceted accomplishment with varying degrees of AI contribution.

The overall architecture of these systems, like the Gemini 3.1-based co-mathematician, functions not as a conversational interface but as a structured workspace. This space features an agent hierarchy and a distinct approach to problem-solving:

Rigid programmatic constraints prevent the system from declaring a task complete until tests pass and a reviewer approves the code and outcomes.
Failed attempts are not discarded but preserved as artifacts, informing subsequent workflows and preventing redundant efforts.

This approach draws parallels with development tools like Claude Code or Google Antigravity, reoriented towards mathematical outputs rather than software engineering.

Frequently Asked Questions

Q: What new math problems has Google DeepMind's AI solved?

Google DeepMind's new AI agent, AlphaProof Nexus, has autonomously generated formal proofs for 9 out of 353 open math problems. It used a Gemini 3.1 Pro model with a Lean compiler.

Q: How much did it cost for the AI to solve these math problems?

The cost to solve these problems varied, with more difficult ones costing around $200 per solved task.

Q: What kind of math problems can the AI not solve yet?

The AI struggles with problems that need the creation of large new theoretical frameworks. It also sometimes makes errors, like suggesting wrong ideas or referencing results that don't exist.

Q: Does this AI have limits even with its successes?

Yes, the AI can make mistakes and propose incorrect ideas that the Lean compiler rejects. Also, simpler versions of the AI achieved the same 9 problem resolutions, questioning the need for its complex design for these specific successes.

DeepMind AI Solves 9 Math Problems, But Not All Complex Ones

Frequently Asked Questions

NewsRadar

The Present

Search Records

Explore

DeepMind AI Solves 9 Math Problems, But Not All Complex Ones

Frequently Asked Questions

Know What Changed

Grab App Update 5.410.0 Fixes Glitches for Southeast Asia Users

New AI Models Built Without Public Input

Nvidia shifts focus from gaming to AI and edge computing

New Gemma 4 AI models understand images and audio

Nintendo Switch 2 Lets You Choose Your Games in New Bundle

DeepSeek V4-Pro AI Model Price Cut 75% in China

NewsRadar

The Present

Search Records

Explore