DeepSeek Unveils DeepSeek-Prover-V2: Advancing Neural Theorem Proving with Recursive Proof Search and a New Benchmark

INSUBCONTINENT EXCLUSIVE:
DeepSeek AI has actually announced the release of DeepSeek-Prover-V2, a groundbreaking open-source big language model specifically developed
for formal theorem showing within the Lean 4 environment
This newest model builds on previous work by presenting an innovative recursive theorem-proving pipeline, leveraging the power of
DeepSeek-V3 to create its own high-quality initialization data
The resulting model accomplishes modern performance in neural theorem proving and is accompanied by the intro of ProverBench, a brand-new
benchmark for examining mathematical reasoning capabilities.An essential development of DeepSeek-Prover-V2 depends on its special cold-start
training treatment
This process starts by prompting the effective DeepSeek-V3 model to decompose complex mathematical theorems into a series of more workable
subgoals
At the same time, DeepSeek-V3 formalizes these high-level evidence actions in Lean 4, efficiently developing a structured series of
sub-problems
To manage the computationally intensive evidence look for each subgoal, the researchers utilized a smaller 7B parameter design
When all the disintegrated actions of a challenging issue are successfully shown, the complete step-by-step official proof is paired with
DeepSeek-V3s corresponding chain-of-thought thinking
This innovative technique allows the model to gain from a manufactured dataset that integrates both informal, high-level mathematical
thinking and rigorous formal proofs, supplying a strong cold start for subsequent support learning.Building upon the artificial cold-start
data, the DeepSeek team curated a choice of tough problems that the 7B prover model couldnt resolve end-to-end, but for which all subgoals
had actually been successfully dealt with
By combining the formal evidence of these subgoals, a complete evidence for the original issue is built
This formal evidence is then linked with DeepSeek-V3s chain-of-thought describing the lemma decay, producing a combined training example of
informal reasoning followed by formalization.The prover model is then fine-tuned on this synthetic information, followed by a support
learning phase
This phase uses binary correct-or-incorrect feedback as the benefit signal, even more refining the designs capability to bridge the space
between informal mathematical instinct and the exact building and construction of formal proofs.The conclusion of this ingenious training
procedure is DeepSeek-Prover-V2671B, a design boasting 671 billion parameters
This design has accomplished exceptional outcomes, demonstrating cutting edge performance in neural theorem proving
It reached an impressive88.9% pass ratio on the MiniF2F-testand effectively solved49 out of 658 issues from PutnamBench
The evidence produced by DeepSeek-Prover-V2 for the miniF2F dataset are openly available for download, permitting further examination and
analysis.In addition to the design release, DeepSeek AI has introducedProverBench, a new benchmark dataset comprising325 problems
This standard is created to provide a more detailed examination of mathematical thinking abilities throughout different levels of
difficulty.ProverBench includes15 issues formalized from current AIME (American Invitational Mathematics Examination) competitors (AIME 24
and 25), supplying genuine difficulties at the high-school competition level
The remaining310 issues are drawn from curated textbook examples and academic tutorials, providing a diverse and pedagogically sound
collection of formalized mathematical issues spanning different locations: ProverBench aims to facilitate a more thorough assessment of
neural theorem provers throughout both challenging competition issues and basic undergraduate-level mathematics.DeepSeek AI is releasing
DeepSeek-Prover-V2 in 2 model sizes to cater to different computational resources: a 7B criterion model and the bigger 671B criterion model
DeepSeek-Prover-V2671B is built on the robust structure of DeepSeek-V3-Base
The smaller sized DeepSeek-Prover-V27B is built upon DeepSeek-Prover-V1.5-Base and features an extended context length of as much as 32K
tokens, allowing it to process longer and more complex thinking sequences.The release of DeepSeek-Prover-V2 and the intro of ProverBench
mark a significant step forward in the field of neural theorem proving
By leveraging a recursive evidence search pipeline and presenting a challenging new benchmark, DeepSeek AI is empowering the community to
establish and assess more advanced and capable AI systems for official mathematics.Linkhttps:// huggingface.co/
deepseek-ai/DeepSeek-Prover-V 2-671BLike this: LikeLoading ...