JetBrains releases Mellum2, an open 12B Mixture-of-Experts model for code
JetBrains released Mellum2 on June 1, 2026, a 12-billion-parameter Mixture-of-Experts (MoE) model trained from scratch on natural language and code, with only 2.5 billion parameters active per token. The IDE maker shipped the model under the Apache 2.0 license through its Hugging Face blog, positioning it as a "focal" component for orchestration, retrieval-augmented generation, and sub-agent roles inside larger AI systems rather than as a general-purpose chat replacement.
What's new
JetBrains describes the launch in plain terms on the Hugging Face blog: "Today we're releasing Mellum2, an open Mixture-of-Experts model optimized for low-latency text-and-code workloads." The post calls it "a 12B-parameter Mixture-of-Experts model trained from scratch on natural language and code."
The headline specs:
- Sparse activation. "The model activates only 2.5B parameters per token, making it efficient for high-throughput, low-latency inference."
- Throughput claim. "Compared with similar-sized models, Mellum2 delivers competitive benchmark performance while achieving more than 2x faster inference."
- License. "It is released under the Apache 2.0 license."
- Scope. "Mellum2 is intentionally focused on text and code rather than multimodal tasks."
JetBrains lists the intended use cases as routing and orchestration, RAG pipelines, sub-agents, and private deployments — explicitly not direct user-facing chat.
Context
JetBrains is best known as the maker of IntelliJ IDEA, PyCharm, and the Kotlin language. The first-generation Mellum debuted as a small in-house model that powered Full Line Code Completion across JetBrains IDEs. Mellum2 is the successor: a wider parameter count, a sparse-activation MoE design that shrinks per-token compute, and an external Apache 2.0 release rather than IDE-only deployment.
The release lands at a moment when most open coding models being shipped — from JetBrains and others — are converging on a similar pattern: total parameter counts in the low-to-mid tens of billions, sparse activation to control latency, and a deliberate split between "focal" models that handle structured intermediate work and larger generalist models that handle reasoning and open-ended chat. JetBrains explicitly frames Mellum2 as the focal piece in that split.
Why it matters
The signal here is architectural positioning, not raw frontier capability — JetBrains makes no such claim. The pitch is that production agent systems do not need a single large model at every step; many sub-agent and routing roles are dominated by latency, throughput, and unit-cost economics, and a small-active-parameter MoE is a better fit for those than a dense 12B model would be. If the "more than 2x faster inference" figure holds in real workloads, Mellum2 becomes cheap enough to use as a glue layer between heavier reasoning models.
Apache 2.0 also matters. It removes the licensing friction that comes with restricted-use weights, which is what teams building private-deployment agent pipelines tend to care about most. JetBrains has both the distribution (its IDE installed base) and the internal benchmark surface (real-world coding completions) to refine the model in a tight loop, so subsequent Mellum revisions are worth watching as a measure of whether a niche, code-first model line can keep pace with the broader open-weights field.
Corroborating sources
- Huggingface.co
https://huggingface.co/blog/JetBrains/mellum2-launch
“Mellum2 is intentionally focused on text and code rather than multimodal tasks.”