DeepSeek releases V4-Pro and V4-Flash on Hugging Face: 1.6T-parameter open-weight models with 1M context and MIT license
DeepSeek has released DeepSeek-V4-Pro and DeepSeek-V4-Flash on Hugging Face, its most capable open-weight models to date. The V4-Pro variant is a 1.6 trillion total parameter mixture-of-experts model with 49 billion active parameters and a full one-million-token context window, released under the MIT license and available for public download.
What's new
DeepSeek-V4-Pro is built around a hybrid attention architecture combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA), designed to sustain performance across extremely long contexts without proportional compute overhead. The model introduces manifold-constrained hyper-connections and uses the Muon optimizer during training.
Key benchmark results:
- MMLU-Pro: 87.5%
- LiveCodeBench: 93.5% — one of the highest publicly reported scores on this benchmark
- GSM8K: 92.6%
- Codeforces: 3206 rating
The model supports three reasoning effort modes: Non-think (fast inference), Think High, and Think Max. This tiered reasoning structure mirrors the approach DeepSeek pioneered with R1-series models, now integrated into the V4 generation.
DeepSeek-V4-Flash is a smaller companion variant optimized for speed and cost-efficient inference. Exact parameter counts for the Flash variant were not disclosed at launch.
Both models are available on Hugging Face under the MIT license, meaning they can be used commercially, fine-tuned, and redistributed without royalties.
Context
DeepSeek has moved faster than most Western labs on open-weight model releases. The V3 generation — released late 2025 — stunned benchmarks and triggered significant debate about the compute efficiency of Chinese AI labs. V4 continues that pattern, adding a 1M-token context window and explicitly targeting coding and agentic workloads.
DeepSeek's technical report for V4 is titled "DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence," signaling that the primary engineering focus was sustaining intelligence at scale across long contexts, not just maximizing top-line benchmark scores.
The lab also continues to release under MIT, maintaining a policy of broad permissive access that distinguishes it from labs like Mistral (which gates larger models) or Meta (Apache 2.0 with use-case restrictions).
Why it matters
A 1.6T-parameter open-weight model with 1M context at MIT license is a significant data point in the open vs. closed frontier debate. Enterprise developers who previously needed to use proprietary APIs for long-context coding and agentic tasks now have a self-hostable alternative with strong benchmark parity.
The LiveCodeBench score of 93.5% is particularly striking — LiveCodeBench tests models against real competition problems released after most models' training cutoffs, making it harder to game through memorization. A score in this range puts DeepSeek-V4-Pro alongside or ahead of several closed-source frontier models on coding tasks.
For the wider AI ecosystem, V4's release continues to compress the time gap between closed-source frontier capabilities and what's available to the open-weight community.
Corroborating sources
- Huggingface.co
https://huggingface.co/deepseek-ai/DeepSeek-V4-Pro
“DeepSeek-V4-Pro-Max...achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks.”