Z.ai releases GLM-5.2 with 1M-token lossless context and top open-source coding benchmark scores
Z.ai released GLM-5.2 on June 16, 2026, an open-source model with a 1M-token lossless context window and benchmark scores that place it at the front of available open-source coding models. The model scores 81.0 on Terminal-Bench 2.1 and 62.1 on SWE-bench Pro, benchmarks designed for realistic software engineering agent evaluation.
What's new
- 1M token lossless context: GLM-5.2 supports 1 million input tokens with full-fidelity retention across the entire context — no compression or sliding-window truncation — enabling project-level codebase analysis where the full dependency graph needs to stay in scope
- 128K max output tokens: Generous output budget for code generation, document-heavy workflows, and long-form structured responses
- Benchmark performance:
- Terminal-Bench 2.1: 81.0 (highest-ranked open-source at release)
- SWE-bench Pro: 62.1
- Reasoning modes: Multiple adaptive thinking modes, with the model selecting reasoning depth based on task complexity
- Developer tooling: Native function calling, structured output (JSON), MCP server integration, context caching, and full streaming support
- Available via the Z.ai developer API
Context
GLM-5.2 is the latest iteration in Z.ai's rapidly evolving GLM-5 series. The base GLM-5 launched in February 2026 benchmarking against frontier closed-source models; GLM-5-Turbo (March) expanded throughput for agentic workloads; GLM-5.1 (April) introduced autonomous operation for tasks running up to eight hours. Each release has pushed toward longer, more coherent execution across multi-step tasks.
The "lossless" characterization for the 1M context window addresses a real limitation in earlier long-context models, where retrieval quality degraded sharply past ~200K tokens due to positional encoding weaknesses or sparse attention approximations. Z.ai's documentation states that GLM-5.2 "maintains more stable performance at ultra-long context, even surpassing Opus in select real-world benchmarks."
The MCP integration is a deliberate production-readiness signal: rather than positioning GLM-5.2 as a research artifact, Z.ai is shipping it as a drop-in component for agent pipelines already built around the MCP tool ecosystem.
Why it matters
The open-source long-context race has meaningful commercial consequences. Teams running self-hosted inference — whether for data-residency requirements, cost control, or customization — have had fewer credible options for tasks that require maintaining a large codebase in context simultaneously. GLM-5.2's Terminal-Bench 2.1 score of 81.0 is particularly relevant here: that benchmark simulates real coding agent tasks in terminal environments, not just isolated puzzle-solving.
For enterprises evaluating open-weight models for deployment, GLM-5.2 now sits alongside DeepSeek V4-Pro as one of the few open-source models that can plausibly handle repository-scale tasks. Z.ai's purpose-built long-context architecture may offer a strong inference-cost-to-capability trade-off for teams that do not need trillion-parameter-scale reasoning.
The SWE-bench Pro score of 62.1 also signals readiness for CI/CD integration — automated pull-request review, test generation, and codebase refactoring pipelines that require consistent, multi-file reasoning across full project context.