Moonshot AI releases Kimi K2.7 Code on Hugging Face: a 1-trillion-parameter MoE coding model with 32B active weights
Moonshot AI has released Kimi K2.7 Code as an open-weights model on Hugging Face, extending the Kimi K2 lineage with a coding-specialist architecture that packs 1 trillion total parameters into a Mixture-of-Experts design that activates only 32 billion parameters per token.
What's new
According to the model card on Hugging Face, "Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6."
The architecture is Mixture-of-Experts with 1 trillion total parameters and 32 billion activated per token, a 400M-parameter MoonViT vision encoder, 256k context length, and support for MLA (Multi-head Latent Attention). The model is released under a Modified MIT license. It always runs in thinking mode — instant (non-thinking) mode is not supported.
Benchmark results from Moonshot's model card compare K2.7 Code against its predecessor K2.6, GPT-5.5, and Claude Opus 4.8 on coding and agentic tasks:
- Kimi Code Bench v2: K2.7 Code scores 62.0, up from K2.6's 50.9 — a 22% improvement. GPT-5.5 scores 69.0; Opus 4.8 scores 67.4.
- Program Bench: K2.7 Code reaches 53.6 vs K2.6's 48.3. GPT-5.5 leads at 69.1; Opus 4.8 scores 63.8.
- MLS Bench Lite: K2.7 Code scores 35.1 vs K2.6's 26.7. GPT-5.5 scores 35.5; Opus 4.8 leads at 42.8.
- MCP Mark Verified (agentic tool use): K2.7 Code scores 81.1, up from K2.6's 72.8. GPT-5.5 leads at 92.9; Opus 4.8 scores 76.4.
Deployment is supported via vLLM, SGLang, and KTransformers. An official API is available at platform.moonshot.ai with an OpenAI/Anthropic-compatible endpoint.
Context
Kimi K2.7 Code builds on top of Kimi K2.6, which Moonshot released in April 2026 and positioned as an open-source state-of-the-art model for agent, coding, and visual understanding tasks. K2.6 supports thinking and non-thinking modes; K2.7 Code drops non-thinking mode entirely to optimize for deep reasoning in extended coding workflows.
The 1T/32B active MoE architecture places Kimi K2.7 Code in the same parameter class as DeepSeek-V3 and other large Chinese open-weight models. The 30% reduction in thinking-token usage relative to K2.6 addresses a practical cost concern for teams running the model at scale — inference charges on MoE models accumulate through active parameters and output tokens, not total parameters.
Why it matters
For the open-source coding model leaderboard, Kimi K2.7 Code narrows the gap between open-weight and proprietary models on real-world software engineering benchmarks. The MLS Bench Lite score of 35.1 — which tests AI-invented machine learning methods over a 5-hour autonomous research window — is competitive with GPT-5.5 (35.5), though Claude Opus 4.8 (42.8) still leads on that task.
The Modified MIT license makes Kimi K2.7 Code broadly deployable in commercial settings, subject to any modifications in the license terms. Combined with the vLLM and SGLang deployment paths, it is a practical option for teams that want a large, capable coding model they can run on their own infrastructure rather than routing through a closed API.
Corroborating sources
- Huggingface.co
https://huggingface.co/moonshotai/Kimi-K2.7-Code
“Kimi K2.7 Code is a coding-focused agentic model built upon Kimi K2.6. With substantial improvements on real-world long-horizon coding tasks, it strengthens end-to-end task completion across complex software engineering workflows while improving token efficiency, reducing thinking-token usage by approximately 30% compared with Kimi K2.6.”