Cohere releases North Mini Code, an open-source agentic coding model with 30B parameters and 3B active on a mixture-of-experts architecture

No audio yetJun 9, 2026published Jun 12, 2026

Cohere on June 9, 2026 released North Mini Code — the company's first agentic coding model and the opening release of its next generation of models. The model uses a mixture-of-experts architecture with 30 billion total parameters and just 3 billion active, ships under an Apache 2.0 license, and is freely available on Hugging Face, the Cohere API, Model Vault, and OpenRouter.

What's new

Architecture and specs:

30B total parameters, 3B active (sparse MoE)
256K total context; 64K max generation
Minimum deployment: 1× H100 @ FP8
Apache 2.0 license — commercial use and modification permitted without restriction

Performance:

Scores 33.4 on the Artificial Analysis Coding Index
Up to 2.8× higher output throughput than Devstral Small 2
30% lower inter-token latency compared to Devstral Small 2

Design focus: North Mini Code is "optimized for code generation, agentic software engineering, and terminal tasks." It handles orchestration natively, including "understanding and orchestrating sub-agents, mapping systems architecture, and running code reviews."

Deployment flexibility: The model is deployable on-prem or locally, targeting teams that need to keep code generation on their own infrastructure rather than routing through third-party APIs.

Context

Cohere has built its market positioning around enterprise AI with a focus on data sovereignty — the ability to run AI inside a customer's own cloud or on-premises environment rather than relying on externally hosted APIs. North Mini Code extends that positioning into the code generation vertical, where the data-sovereignty argument is particularly strong: internal codebases, proprietary libraries, and unreleased product code represent sensitive intellectual property that many engineering teams are unwilling to route through external inference endpoints.

The MoE architecture — 30B parameters total, 3B active — mirrors the design strategy used by Mistral's Devstral and DeepSeek's MoE models: achieve high throughput and low per-token cost by activating only a fraction of parameters per forward pass. At 1× H100 as the stated minimum, North Mini Code is deployable on a single high-end GPU, a practical threshold for teams without dedicated inference clusters.

Cohere explicitly frames the release as advancing "sovereign AI" — a positioning it has used to differentiate from OpenAI and Anthropic, whose primary offerings are API-based. The Apache 2.0 license underscores that framing, removing licensing uncertainty for commercial deployments.

Why it matters

The agentic coding model market has grown sharply in 2026, with Devstral (Mistral), Claude Code (Anthropic), and proprietary models from major platforms all competing for developer and enterprise adoption. North Mini Code enters that market with three differentiating properties: on-prem deployment without vendor lock-in, MoE efficiency for high-throughput use cases, and a first-in-class position within Cohere's next-gen model family.

For enterprises that have been avoiding AI coding assistance due to data residency requirements — sectors like finance, defense, healthcare, and government — a capable open-weights model that runs on a single H100 removes the primary technical barrier. The 256K context window is large enough to hold significant codebases, and the native sub-agent orchestration capability means North Mini Code can serve as a coordinator in multi-step engineering workflows, not just a single-turn code completer.

The 2.8× throughput advantage over Devstral Small 2 is a commercially meaningful claim, particularly for teams considering running code generation at scale within automated pipelines.

Corroborating sources

Cohere
https://cohere.com/blog/north-mini-code
“Cohere's first agentic coding model, and the inaugural member of our next generation of powerful models.”

What's new

Architecture and specs:

30B total parameters, 3B active (sparse MoE)

256K total context; 64K max generation

Minimum deployment: 1× H100 @ FP8

Apache 2.0 license — commercial use and modification permitted without restriction

Performance:

Scores 33.4 on the Artificial Analysis Coding Index

Up to 2.8× higher output throughput than Devstral Small 2

30% lower inter-token latency compared to Devstral Small 2

Deployment flexibility: The model is deployable on-prem or locally, targeting teams that need to keep code generation on their own infrastructure rather than routing through third-party APIs.

Context

Why it matters

The 2.8× throughput advantage over Devstral Small 2 is a commercially meaningful claim, particularly for teams considering running code generation at scale within automated pipelines.