ReleaseZ.ai (Zhipu AI)Verified

Z.ai releases GLM-5.2 with 1M-token lossless context and top open-source coding benchmark scores

No audio yetJun 16, 2026published Jun 18, 2026

Z.ai released GLM-5.2 on June 16, 2026, an open-source model with a 1M-token lossless context window and benchmark scores that place it at the front of available open-source coding models. The model scores 81.0 on Terminal-Bench 2.1 and 62.1 on SWE-bench Pro, benchmarks designed for realistic software engineering agent evaluation.

What's new

1M token lossless context: GLM-5.2 supports 1 million input tokens with full-fidelity retention across the entire context — no compression or sliding-window truncation — enabling project-level codebase analysis where the full dependency graph needs to stay in scope
128K max output tokens: Generous output budget for code generation, document-heavy workflows, and long-form structured responses
Benchmark performance:
- Terminal-Bench 2.1: 81.0 (highest-ranked open-source at release)
- SWE-bench Pro: 62.1
Reasoning modes: Multiple adaptive thinking modes, with the model selecting reasoning depth based on task complexity
Developer tooling: Native function calling, structured output (JSON), MCP server integration, context caching, and full streaming support
Available via the Z.ai developer API

Context

GLM-5.2 is the latest iteration in Z.ai's rapidly evolving GLM-5 series. The base GLM-5 launched in February 2026 benchmarking against frontier closed-source models; GLM-5-Turbo (March) expanded throughput for agentic workloads; GLM-5.1 (April) introduced autonomous operation for tasks running up to eight hours. Each release has pushed toward longer, more coherent execution across multi-step tasks.

The "lossless" characterization for the 1M context window addresses a real limitation in earlier long-context models, where retrieval quality degraded sharply past ~200K tokens due to positional encoding weaknesses or sparse attention approximations. Z.ai's documentation states that GLM-5.2 "maintains more stable performance at ultra-long context, even surpassing Opus in select real-world benchmarks."

The MCP integration is a deliberate production-readiness signal: rather than positioning GLM-5.2 as a research artifact, Z.ai is shipping it as a drop-in component for agent pipelines already built around the MCP tool ecosystem.

Why it matters

The open-source long-context race has meaningful commercial consequences. Teams running self-hosted inference — whether for data-residency requirements, cost control, or customization — have had fewer credible options for tasks that require maintaining a large codebase in context simultaneously. GLM-5.2's Terminal-Bench 2.1 score of 81.0 is particularly relevant here: that benchmark simulates real coding agent tasks in terminal environments, not just isolated puzzle-solving.

For enterprises evaluating open-weight models for deployment, GLM-5.2 now sits alongside DeepSeek V4-Pro as one of the few open-source models that can plausibly handle repository-scale tasks. Z.ai's purpose-built long-context architecture may offer a strong inference-cost-to-capability trade-off for teams that do not need trillion-parameter-scale reasoning.

The SWE-bench Pro score of 62.1 also signals readiness for CI/CD integration — automated pull-request review, test generation, and codebase refactoring pipelines that require consistent, multi-file reasoning across full project context.

Corroborating sources

Docs.z
https://docs.z.ai/release-notes
“Supports 1M lossless context, significantly improving long-horizon task capabilities”
Docs.z
https://docs.z.ai/guides/llm/glm-5.2
“GLM-5.2 maintains more stable performance at ultra-long context, even surpassing Opus in select real-world benchmarks.”

What's new

1M token lossless context: GLM-5.2 supports 1 million input tokens with full-fidelity retention across the entire context — no compression or sliding-window truncation — enabling project-level codebase analysis where the full dependency graph needs to stay in scope

128K max output tokens: Generous output budget for code generation, document-heavy workflows, and long-form structured responses

Benchmark performance:

Terminal-Bench 2.1: 81.0 (highest-ranked open-source at release)
SWE-bench Pro: 62.1

Reasoning modes: Multiple adaptive thinking modes, with the model selecting reasoning depth based on task complexity

Developer tooling: Native function calling, structured output (JSON), MCP server integration, context caching, and full streaming support

Available via the Z.ai developer API

Context

Why it matters