Z.AI releases GLM-5.2: a 753B open-weight model with 1M context and 2.9x FLOPs reduction via IndexShare
Z.AI has released GLM-5.2, its latest open-weight flagship model for long-horizon tasks. The 753B-parameter model is available under an MIT license with no regional restrictions and ships with a stable 1M-token context window, marking a significant step forward from GLM-5.1.
What's new
GLM-5.2 introduces several technical advances over its predecessors:
- IndexShare architecture: A new attention mechanism that reuses the same indexer across every four sparse attention layers, cutting per-token FLOPs by 2.9× at a 1M context length — a meaningful efficiency gain for inference at scale.
- 1M-token context: The model sustains long-horizon work across the full 1M window, not just in theory but in stable operation.
- Multiple thinking effort levels: Configurable reasoning depth lets operators trade off between peak performance and lower latency, depending on the task.
- MIT license, no regional limits: Available without geographic access restrictions, positioning it as a globally accessible open alternative to proprietary frontier models.
Benchmark results are competitive with frontier-class models:
- AIME 2026: 99.2%
- GPQA-Diamond: 91.2%
- SWE-bench Pro: 62.1%
- Terminal Bench 2.1: 82.7%
- MCP-Atlas (agentic): 76.8%
The model is available on Hugging Face under the zai-org/GLM-5.2 namespace and on Z.AI's API platform.
Context
GLM-5.2 is the third major iteration in Z.AI's GLM-5 family. GLM-5 launched as Z.AI's initial open-weight offering; GLM-5.1 added long-horizon task capability and was benchmarked against Claude Opus 4.6 and GPT on SWE-Bench Pro. GLM-5.2 extends that trajectory with the IndexShare efficiency improvement and a solidified 1M context.
The GLM series has positioned itself as China's most capable domestically developed open-weight model family, with Z.AI now publishing the underlying technical work (arxiv: 2602.15763) alongside the weights release.
OpenRouter's model registry picked up the GLM-5.2 release today, making the model immediately accessible through third-party inference providers in addition to Z.AI's own platform.
Why it matters
The 2.9× FLOPs reduction via IndexShare is the headline technical contribution. Running a 753B model at 1M context is computationally expensive; an architecture that cuts per-token compute by nearly 3× at that context length changes the economic picture for inference providers and self-hosters alike.
The MIT license without regional restrictions is a deliberate positioning move. Several competing open models carry usage restrictions or regional carve-outs that limit enterprise adoption. Z.AI is explicitly removing those friction points.
At 62.1% on SWE-bench Pro, GLM-5.2 is competitive with leading proprietary models on software engineering tasks — a benchmark category that directly correlates with practical developer utility. For teams that want frontier-class coding performance from a model they can deploy locally or through their own infrastructure, GLM-5.2 is now a credible option.
Corroborating sources
- Huggingface.co
https://huggingface.co/zai-org/GLM-5.2
“A solid 1M-token context that stably sustains long-horizon work”