FeatureOpenAIVerified

OpenAI updates GPT-Rosalind with new life sciences benchmarks and Codex plugins

ListenJun 3, 2026published Jun 4, 2026

OpenAI on June 3, 2026 shipped a major update to GPT-Rosalind, its life-sciences reasoning model, alongside two production plugins for Codex that turn the model's intelligence into repeatable scientific workflows. The update introduces four new expert-judged benchmarks (LifeSciBench, MedChemBench, GeneBench, LabWorkBench), reports broad accuracy gains over GPT-5.5 at lower token cost, and broadens access to the underlying tools across all Codex users.

What's new

Model update. OpenAI introduces a new GPT-Rosalind model update "purpose-built for life sciences research at enterprise scale." It combines GPT-5.5's agentic coding and tool use with stronger model intelligence in medicinal chemistry, genomics, and broader life-sciences design and experimental workflows.
LifeSciBench. OpenAI's new externally expert-judged benchmark draws tasks from six life-sciences workflow areas: evidence handling, analysis, design and optimization, scientific reasoning, validation and operations, and translation and communication.
MedChemBench. A new medicinal-chemistry workflow benchmark covering multimodal chemical-structure understanding, SAR, drug potency/toxicity/ADME prediction, multi-parameter lead optimization, and retrosynthesis. GPT-Rosalind scores 27.5% versus GPT-5.5 at 25.1%, while using 7.2% fewer tokens.
GeneBench. OpenAI's agentic evaluation on long-horizon, end-to-end genomics and quantitative-biology analysis. GPT-Rosalind achieves 21.6% accuracy versus GPT-5.5's 20.4% while using 31% fewer tokens.
LabWorkBench. A real wet-lab protocol benchmark testing the model's ability to link perturbations to experimental outcomes. GPT-Rosalind scores 63.2% versus GPT-5.5 at 55.8%, using 5.3% fewer tokens. Data are proprietary and thus uncontaminated.
Two new Codex plugins. A Life Sciences Research plugin (sourced evidence retrieval and biological interpretation) and a Life Sciences NGS Analysis plugin (bioinformatics execution including ctDNA review, scRNA-seq QC and annotation, and bulk RNA-seq FASTQ QC) are now available to all Codex users. Qualified GPT-Rosalind enterprise users can additionally power both plugins with GPT-Rosalind itself.
Interactive scientific viewers. Codex now includes native sequence, alignment, and structure viewers so a scientist can inspect mutant residues, conservation, and inhibitor-bound pockets in the same workspace where GPT-Rosalind is reasoning.
Availability. GPT-Rosalind remains a gated research preview, available globally to eligible organizations through OpenAI's trusted-access deployment structure.

Context

GPT-Rosalind was first introduced in April 2026 as a frontier reasoning model for biology, drug discovery, and translational medicine. The June 3 update is the model series' first significant capability refresh and is paired with productized workflow tooling rather than shipped as a standalone model release. OpenAI continues to gate the model itself behind its trusted-access program — consistent with how it handled the April launch — while pushing the Codex-side plugins out to every user as the on-ramp.

The four new benchmarks are notable because each one was built in-house to reflect realistic life-sciences workflows rather than reuse existing academic suites. LabWorkBench in particular is described as drawing on proprietary wet-lab data that has not been contaminated by prior training, which is a credibility play in a category where benchmark contamination has been an open critique.

Why it matters

This is the clearest signal yet that OpenAI is investing in vertical, domain-specialized model series rather than only general-purpose frontier scaling. GPT-Rosalind sits alongside Codex as a second productized vertical surface — life sciences research — and the new benchmark slate plus plugin availability give OpenAI the beginning of a defensible competitive moat in pharma and academic biology, both of which have historically been slow to adopt general-purpose chat models for production work.

The accuracy-plus-token-efficiency framing across all three benchmarks (better accuracy and 5–31% fewer tokens than GPT-5.5) also matters for budget-sensitive lab workflows where multi-step agentic runs can rack up cost quickly. And the plugin layer — sourced evidence retrieval and bioinformatics execution shipped to all Codex users — broadens OpenAI's surface into a researcher's everyday workspace even when the underlying model running the plugin is not yet GPT-Rosalind.

For enterprise life-sciences buyers, the practical takeaway is that the gate to evaluate the plugins drops to zero today, while serious GPT-Rosalind access still routes through OpenAI's trusted-access intake.

Corroborating sources

Venturebeat
https://venturebeat.com/technology/openai-debuts-gpt-rosalind-a-new-limited-access-model-for-life-sciences-and-broader-codex-plugin-on-github
Openai
https://openai.com/index/introducing-new-capabilities-to-gpt-rosalind/
“Progress in life sciences depends on synthesizing data and evidence across scales and modalities: molecules, genes, pathways, and living systems.”

What's new

Model update. OpenAI introduces a new GPT-Rosalind model update "purpose-built for life sciences research at enterprise scale." It combines GPT-5.5's agentic coding and tool use with stronger model intelligence in medicinal chemistry, genomics, and broader life-sciences design and experimental workflows.

LifeSciBench. OpenAI's new externally expert-judged benchmark draws tasks from six life-sciences workflow areas: evidence handling, analysis, design and optimization, scientific reasoning, validation and operations, and translation and communication.

MedChemBench. A new medicinal-chemistry workflow benchmark covering multimodal chemical-structure understanding, SAR, drug potency/toxicity/ADME prediction, multi-parameter lead optimization, and retrosynthesis. GPT-Rosalind scores 27.5% versus GPT-5.5 at 25.1%, while using 7.2% fewer tokens.

GeneBench. OpenAI's agentic evaluation on long-horizon, end-to-end genomics and quantitative-biology analysis. GPT-Rosalind achieves 21.6% accuracy versus GPT-5.5's 20.4% while using 31% fewer tokens.

LabWorkBench. A real wet-lab protocol benchmark testing the model's ability to link perturbations to experimental outcomes. GPT-Rosalind scores 63.2% versus GPT-5.5 at 55.8%, using 5.3% fewer tokens. Data are proprietary and thus uncontaminated.

Two new Codex plugins. A Life Sciences Research plugin (sourced evidence retrieval and biological interpretation) and a Life Sciences NGS Analysis plugin (bioinformatics execution including ctDNA review, scRNA-seq QC and annotation, and bulk RNA-seq FASTQ QC) are now available to all Codex users. Qualified GPT-Rosalind enterprise users can additionally power both plugins with GPT-Rosalind itself.

Interactive scientific viewers. Codex now includes native sequence, alignment, and structure viewers so a scientist can inspect mutant residues, conservation, and inhibitor-bound pockets in the same workspace where GPT-Rosalind is reasoning.

Availability. GPT-Rosalind remains a gated research preview, available globally to eligible organizations through OpenAI's trusted-access deployment structure.

Context

Why it matters