Cohere releases North Mini Code, an open-source agentic coding model with 30B parameters and 3B active on a mixture-of-experts architecture
Cohere on June 9, 2026 released North Mini Code — the company's first agentic coding model and the opening release of its next generation of models. The model uses a mixture-of-experts architecture with 30 billion total parameters and just 3 billion active, ships under an Apache 2.0 license, and is freely available on Hugging Face, the Cohere API, Model Vault, and OpenRouter.
What's new
Architecture and specs:
- 30B total parameters, 3B active (sparse MoE)
- 256K total context; 64K max generation
- Minimum deployment: 1× H100 @ FP8
- Apache 2.0 license — commercial use and modification permitted without restriction
Performance:
- Scores 33.4 on the Artificial Analysis Coding Index
- Up to 2.8× higher output throughput than Devstral Small 2
- 30% lower inter-token latency compared to Devstral Small 2
Design focus: North Mini Code is "optimized for code generation, agentic software engineering, and terminal tasks." It handles orchestration natively, including "understanding and orchestrating sub-agents, mapping systems architecture, and running code reviews."
Deployment flexibility: The model is deployable on-prem or locally, targeting teams that need to keep code generation on their own infrastructure rather than routing through third-party APIs.
Context
Cohere has built its market positioning around enterprise AI with a focus on data sovereignty — the ability to run AI inside a customer's own cloud or on-premises environment rather than relying on externally hosted APIs. North Mini Code extends that positioning into the code generation vertical, where the data-sovereignty argument is particularly strong: internal codebases, proprietary libraries, and unreleased product code represent sensitive intellectual property that many engineering teams are unwilling to route through external inference endpoints.
The MoE architecture — 30B parameters total, 3B active — mirrors the design strategy used by Mistral's Devstral and DeepSeek's MoE models: achieve high throughput and low per-token cost by activating only a fraction of parameters per forward pass. At 1× H100 as the stated minimum, North Mini Code is deployable on a single high-end GPU, a practical threshold for teams without dedicated inference clusters.
Cohere explicitly frames the release as advancing "sovereign AI" — a positioning it has used to differentiate from OpenAI and Anthropic, whose primary offerings are API-based. The Apache 2.0 license underscores that framing, removing licensing uncertainty for commercial deployments.
Why it matters
The agentic coding model market has grown sharply in 2026, with Devstral (Mistral), Claude Code (Anthropic), and proprietary models from major platforms all competing for developer and enterprise adoption. North Mini Code enters that market with three differentiating properties: on-prem deployment without vendor lock-in, MoE efficiency for high-throughput use cases, and a first-in-class position within Cohere's next-gen model family.
For enterprises that have been avoiding AI coding assistance due to data residency requirements — sectors like finance, defense, healthcare, and government — a capable open-weights model that runs on a single H100 removes the primary technical barrier. The 256K context window is large enough to hold significant codebases, and the native sub-agent orchestration capability means North Mini Code can serve as a coordinator in multi-step engineering workflows, not just a single-turn code completer.
The 2.8× throughput advantage over Devstral Small 2 is a commercially meaningful claim, particularly for teams considering running code generation at scale within automated pipelines.
Corroborating sources
- Cohere
https://cohere.com/blog/north-mini-code
“Cohere's first agentic coding model, and the inaugural member of our next generation of powerful models.”