Microsoft AI launches seven in-house MAI models at Build 2026, led by MAI-Thinking-1

ListenJun 2, 2026published Jun 3, 2026

Microsoft on June 2, 2026 used its Build developer conference to unveil seven foundation models built from scratch by its in-house Microsoft AI division — the first time the company has rolled out a full family of homegrown models, and the clearest signal yet that Microsoft intends to reduce its dependence on OpenAI for frontier capabilities. The launch was announced by Microsoft AI chief Mustafa Suleyman and detailed by Microsoft in its Build 2026 wrap-up, spanning reasoning, agentic coding, image generation and editing, transcription, and speech synthesis.

What's new

The MAI family unveiled at Build 2026 includes:

MAI-Thinking-1 — Microsoft AI's first reasoning model. Microsoft describes it as "a mid-sized, 35 billion active parameter model with a 256K context window built for high efficiency and performance, but importantly, at a low-token cost." In Microsoft's own evaluations, "independent raters prefer it to Sonnet 4.6, and it matches Opus 4.6 on coding abilities on SWE Bench Pro." The model is launching in private preview on Microsoft Foundry.
MAI-Code-1 and MAI-Code-1-Flash — "inference efficient coding model[s] tuned for GitHub," rolling into GitHub Copilot and VS Code, with MAI-Code-1-Flash sized at five billion active parameters for low-cost, high-throughput agentic use.
MAI-Image-2.5 and a MAI-Image-2.5-Flash variant — Microsoft's first models to serve both text-to-image and image-to-image workloads. Microsoft says the family currently sits at "#3 on the Arena AI leaderboard" for text-to-image and "#2 on the Arena AI leaderboard, surpassing Nano Banana 2," for image-to-image.
MAI-Transcribe-1.5 — a transcription model Microsoft says "combines state-of-the-art accuracy across 43 languages, with streaming coming soon."
MAI-Voice-2 and a MAI-Voice-2-Flash variant — speech models "now available in more than 15 additional languages with new voice options."

Alongside the MAI family, Microsoft announced a co-developed Mayo Clinic Frontier AI Model for clinical reasoning, deployed first inside Mayo Clinic's own environment and offered as a reference customer for Microsoft's new "Frontier Tuning" workflow, which lets enterprises continue training MAI weights on their own data inside their own infrastructure. Image, transcription, and voice models are generally available on Microsoft Foundry and the MAI Playground today; the MAI family is also being made available through third-party developer platforms OpenRouter, Fireworks, and Baseten.

Context

Microsoft frames the launch as a clean break from model-distillation practices common across the industry: MAI-Thinking-1 is "trained from scratch with zero distillation on enterprise grade, clean and commercially licensed data you can build on with confidence." That language matters because it positions the MAI series as an option for regulated buyers that have been nervous about data provenance and chain-of-custody on third-party reasoning models. The Mayo Clinic partnership is the visible proof point for that pitch.

Microsoft remains a significant OpenAI investor and continues to ship OpenAI models through Azure and Microsoft 365 Copilot. But Build's framing — and Suleyman's commentary in adjacent Microsoft AI posts about "long term self-sufficiency for Microsoft and our partners" — emphasizes parallelism over exclusivity. After today, Microsoft can credibly point to in-house models at every layer it currently buys from OpenAI: reasoning, coding, image, transcription, voice.

Why it matters

Until now, MAI was best known for consumer Copilot work and a small number of research-only model disclosures. A flagship reasoning model paired with a coding model going directly into GitHub Copilot rewrites how Microsoft talks about its model strategy. It also adds a fourth serious hyperscaler-tier developer model surface alongside OpenAI, Anthropic, and Google. The distribution choice — shipping the new models through OpenRouter, Fireworks, and Baseten on day one alongside Foundry — reads as a deliberate signal that Microsoft AI wants developer adoption beyond captive Azure workloads, not just inside them.

For enterprise buyers the practical question is whether MAI-Thinking-1's claimed parity with Anthropic's Sonnet 4.6 and code-parity with Opus 4.6 on SWE Bench Pro holds up under independent eval. If it does, Microsoft has just given every Azure and Copilot customer a credible single-vendor alternative to its biggest model partner — and changed the bargaining position for everyone weighing OpenAI vendor-risk against operational convenience.

Corroborating sources

Blogs.microsoft
https://blogs.microsoft.com/blog/2026/06/02/microsoft-build-2026-be-yourself-at-work/
“It's a mid-sized, 35 billion active parameter model with a 256K context window built for high efficiency and performance, but importantly, at a low-token cost.”
News.microsoft
https://news.microsoft.com/build-2026-live-blog/microsoft-build-2026-live/
“The Microsoft AI Superintelligence Team recently finished the newest generation of its in-house models, which now support reasoning, code generation and text-to-image and image-to-image workloads for developers.”

What's new

The MAI family unveiled at Build 2026 includes:

MAI-Thinking-1 — Microsoft AI's first reasoning model. Microsoft describes it as "a mid-sized, 35 billion active parameter model with a 256K context window built for high efficiency and performance, but importantly, at a low-token cost." In Microsoft's own evaluations, "independent raters prefer it to Sonnet 4.6, and it matches Opus 4.6 on coding abilities on SWE Bench Pro." The model is launching in private preview on Microsoft Foundry.

MAI-Code-1 and MAI-Code-1-Flash — "inference efficient coding model[s] tuned for GitHub," rolling into GitHub Copilot and VS Code, with MAI-Code-1-Flash sized at five billion active parameters for low-cost, high-throughput agentic use.

MAI-Image-2.5 and a MAI-Image-2.5-Flash variant — Microsoft's first models to serve both text-to-image and image-to-image workloads. Microsoft says the family currently sits at "#3 on the Arena AI leaderboard" for text-to-image and "#2 on the Arena AI leaderboard, surpassing Nano Banana 2," for image-to-image.

MAI-Transcribe-1.5 — a transcription model Microsoft says "combines state-of-the-art accuracy across 43 languages, with streaming coming soon."

MAI-Voice-2 and a MAI-Voice-2-Flash variant — speech models "now available in more than 15 additional languages with new voice options."

Context

Why it matters