NewsroomResearch & Benchmarks

Research & Benchmarks

Papers, findings, evals, and leaderboard results — the science behind the models.

29 stories

Today

ResearchJun 19, 2026 · 2 min read

Google releases DiffusionGemma, a 26B MoE open model generating text via parallel diffusion at over 1100 tokens per second on H100

Google on June 10, 2026 released DiffusionGemma, an experimental open-weights language model that generates text through a fundamentally different mechanism than standard autoregressive models. While…

Yesterday

ResearchJun 19, 2026 · 2 min read

OpenAI o3 helps Boston Children's Hospital diagnose 18 previously unsolved rare genetic diseases in children

Researchers at Boston Children's Hospital have used OpenAI's o3 reasoning model to identify new diagnoses for 18 children whose rare genetic diseases had gone unresolved by standard clinical workup.…

OpenAI

ResearchJun 18, 2026 · 2 min read

OpenAI alignment research: beneficial-trait RL training generalizes safety improvements across 83% of benchmarks

OpenAI's alignment research team published findings today showing that reinforcement learning applied to a targeted set of "beneficial trait" training scenarios produces safety improvements that…

OpenAI

Earlier this week

ResearchJun 18, 2026 · 2 min read

OpenAI and Molecule.one demonstrate near-autonomous AI chemist improving medicinal chemistry reaction yields

OpenAI and Molecule.one published research on June 17, 2026 showing a near-autonomous AI system successfully improved a challenging reaction in medicinal chemistry — the first time an AI has tackled…

OpenAI

BenchmarkJun 17, 2026 · 2 min read

ChatGPT market share falls below 50% for the first time as Gemini and Claude gain ground

ChatGPT has lost its majority position in the AI assistant market for the first time since its 2022 launch, dropping to 46.4% market share as of May 2026, according to Sensor Tower's State of AI…

ResearchJun 17, 2026 · 2 min read

Google AMIE matches primary care physicians in chronic disease management, outperforms on plan preciseness in Nature study

Google's AMIE (Articulate Medical Intelligence Explorer) has matched the performance of primary care physicians in managing chronic health conditions and significantly outperformed them in plan…

Google

ResearchJun 17, 2026 · 2 min read

Anthropic research: domain expertise—not coding background—predicts success with Claude Code, based on 400,000 real sessions

Anthropic published research on June 16 showing that the single strongest predictor of success with Claude Code is domain expertise in the task at hand — not whether a user has a background in…

Anthropic

Earlier

ResearchJun 16, 2026 · 2 min read

Alibaba releases Qwen-RobotWorld: a language-conditioned video world model ranking first on EWMBench and DreamGen Bench

Alibaba's Qwen team released a technical report on June 15, 2026 introducing Qwen-RobotWorld, a foundation model for embodied AI that predicts future visual trajectories across robotics, autonomous…

BenchmarkJun 16, 2026 · 2 min read

NVIDIA Blackwell sweeps all seven MLPerf Training 6.0 benchmarks, the only platform with full-suite submissions

NVIDIA's Blackwell GPU platform claimed the top spot across all seven benchmarks in MLPerf Training 6.0, the latest edition of the peer-reviewed AI training benchmark suite. The results, published…

ResearchJun 14, 2026 · 2 min read

Anthropic research finds deterministic retrieval tools are the key bottleneck for AI agents in biology, not model capability

Anthropic published research on June 8, 2026, examining why AI agents struggle to reliably retrieve biological data — and how a single deterministic tool layer nearly eliminates the performance gap…

Anthropic

BenchmarkJun 12, 2026 · 2 min read

NVIDIA Blackwell leads Artificial Analysis AgentPerf — the first agentic AI infrastructure benchmark

NVIDIA's Blackwell architecture has taken the top spot in AgentPerf, the first benchmark designed specifically to measure agentic AI infrastructure performance, according to results published June 12…

ResearchJun 12, 2026 · 3 min read

Anthropic launches Public Record survey: 64% of Americans fear AI job loss, only 15% trust AI companies

Anthropic on June 12 released the inaugural results from its Anthropic Public Record, a new survey series the company says it will repeat to track how U.S. public attitudes toward AI shift over time.…

Anthropic

ResearchJun 11, 2026 · 3 min read

Alibaba's Qwen team publishes Qwen-Image-Flash, bringing few-step distillation to text-to-image generation and instruction-guided editing

Alibaba's Qwen research team has published Qwen-Image-Flash: Beyond Objective Design on arXiv, presenting a few-step distillation framework that accelerates the Qwen-Image-2.0 model family for both…

ResearchJun 10, 2026 · 2 min read

DeepSeek paper: Lookahead Sparse Attention cuts KV cache to 13.5% at 500K-token scales

DeepSeek researchers published a paper on June 9, 2026 introducing Lookahead Sparse Attention (LSA), an inference-time technique that compresses the GPU memory footprint of long-context LLM serving…

BenchmarkJun 10, 2026 · 3 min read

Vercel's May 2026 AI gateway data: Anthropic holds 65% of spend as DeepSeek surges to 17% of token volume

Vercel published its AI gateway production index for May 2026, drawing on routing traffic across its infrastructure to document a widening divergence in the AI provider market: DeepSeek's share of…

ResearchJun 8, 2026 · 3 min read

Anthropic Disrupts First AI-Orchestrated Cyber Espionage Campaign, Attributed to Chinese State-Sponsored Group

Anthropic disclosed on June 3, 2026 that it detected and disrupted what it describes as the first large-scale cyberattack executed largely without human intervention — a campaign in which a Chinese…

Anthropic

ResearchJun 7, 2026 · 2 min read

Google DeepMind's Running Guide agent navigates blind and low-vision runners outdoors using on-device Gemma 4

Google DeepMind published details on May 20, 2026 of an accessibility agent that guides blind and low-vision athletes while running outdoors in real time, using a Pixel 10 Pro smartphone worn on the…

Google

ResearchJun 7, 2026 · 3 min read

Anthropic Benchmarks Claude Against ChemDraw on NMR Spectroscopy, Finds Comparable Accuracy

Anthropic on June 5 published benchmark data showing that Claude Opus 4.7 — a general-purpose frontier model with no chemistry-specific fine-tuning — performs comparably to specialized NMR software…

Anthropic

ResearchJun 7, 2026 · 3 min read

Google DeepMind Co-Scientist helps MIT biologists identify genetic factors that reverse cellular aging in days, not months

Google DeepMind's AI research assistant Co-Scientist helped biologists at MIT identify more than 20 genetic factors that may reverse cellular aging — and lab experiments confirmed that several of…

Google

ResearchJun 5, 2026 · 3 min read

DeepMind researchers argue solipsistic AI design produces uncooperative superintelligence

A team of researchers including prominent Google DeepMind scientists published a paper at the 43rd International Conference on Machine Learning (ICML 2026) arguing that the dominant paradigm in AI…