MiniMax releases M3: open-weight 427B model with 1M context, native multimodality, and 9x long-context speedup
Shanghai-based AI lab MiniMax released MiniMax-M3 on June 1, 2026 — a 427-billion parameter open-weight model with a 1-million-token context window, native multimodal understanding, and a new sparse attention architecture that delivers order-of-magnitude efficiency gains at long context.
What's new
MiniMax M3 is a Mixture-of-Experts model with approximately 427 billion total parameters and 23 billion activated parameters per token. Its most significant technical contribution is MiniMax Sparse Attention (MSA), a sparse attention operator purpose-built for million-token contexts.
Performance at scale: Compared to its predecessor M2, M3 delivers 9× prefill and 15× decode speedups at 1M context, reducing per-token compute to 1/20. That closes most of the practical speed gap that has made 1M-context models expensive to run in production.
Native multimodality from the ground up: Unlike models that add vision capabilities post-training, M3 "undergoes mixed-modality training from the very first step, enabling deeper semantic fusion across text, image, and video." It supports image-text-to-text and video-text-to-text tasks natively.
Two reasoning modes: A thinking mode for complex reasoning and agentic tasks, and a non-thinking mode for latency-sensitive applications.
Key specs:
- 427B total parameters (MoE), 23B active per token
- 1M token context window
- Text, image, and video input
- BF16/F32 precision
- License: minimax-community
Benchmarks: M3 scores 59.0% on SWE-Bench Pro, which MiniMax claims exceeds GPT-5.5 and Gemini 3.1 Pro on the same benchmark. The model targets long-horizon agentic coding and multimodal reasoning.
Availability: The API launched June 1. Model weights were released on Hugging Face within 10 days of launch, with pricing at approximately 2.1 yuan ($0.30) per million input tokens and 8.4 yuan ($1.20) per million output tokens for contexts up to 512K.
Context
MiniMax is one of China's leading AI labs, having previously shipped M2 as an early 1M-context model. M3 represents a significant leap in efficiency: the MSA architecture directly addresses the quadratic compute cost that makes long-context inference impractical at scale.
The open-weight release under a minimax-community license follows a pattern common among Chinese AI labs — releasing weights to build ecosystem and developer adoption while controlling commercial use terms.
The timing puts M3 in competition with Moonshot AI's Kimi K2 series and DeepSeek V4, both of which also target long-horizon agentic performance. MiniMax's multimodal-from-scratch approach differs from most competitors, which add vision capabilities to base language models.
Why it matters
A 427B MoE model with genuine 1M-context efficiency at 1/20 the compute cost of M2 is a meaningful advance, not an incremental update. For developers who need long-context reasoning across documents, code bases, or multimedia inputs without paying premium inference rates, M3 is now a credible option to evaluate alongside closed-source models.
The 9× prefill speedup is the number that matters most for production workloads: it makes 1M-context calls practical for latency-constrained applications, not just batch processing. If the SWE-Bench Pro claims hold up under independent evaluation, M3 will be notable as the first open-weight model to reach frontier coding performance alongside frontier-scale context.
Corroborating sources
- Huggingface.co
https://huggingface.co/MiniMaxAI/MiniMax-M3
“M3 delivers 9× prefill and 15× decode speedups compared to M2 at 1M context, reducing per-token compute to 1/20.”