ResearchAnthropicVerified

Anthropic Benchmarks Claude Against ChemDraw on NMR Spectroscopy, Finds Comparable Accuracy

ListenJun 5, 2026published Jun 7, 2026

Anthropic on June 5 published benchmark data showing that Claude Opus 4.7 — a general-purpose frontier model with no chemistry-specific fine-tuning — performs comparably to specialized NMR software on nuclear magnetic resonance spectroscopy tasks, and can work the problem in reverse: proposing molecular structures directly from spectral data alone.

What's new

Anthropic chemist David Kamber tested three Claude models (Opus 4.7, Opus 4.6, Sonnet 4.6) against ChemDraw and MestReNova on 20 compounds drawn from synthetic chemistry preprints published after the models' training cutoff — a design choice intended to prevent selection bias.

Forward prediction (structure → spectrum):

Claude Opus 4.7 achieved an average hydrogen peak error of ±0.079 ppm, placing it within or near the tolerance window more often than any other tool tested
On carbon prediction, Opus 4.7 and MestReNova were "effectively tied, at ±1.37 and ±1.48 ppm"
On NMR peak splitting patterns, all three Claude models predicted sub-peak spacing to within half a hertz approximately 80% of the time, versus 26–35% for ChemDraw and MestReNova
Opus 4.7 was the most consistent across three repeat runs

Inverse prediction (spectrum → structure): Anthropic tested Opus 4.7 on 15 structure elucidation problems given only a molecular formula (from high-resolution mass spectrometry) and 1D NMR spectra:

All 8 simpler targets (single-ring or two-fragment molecules) were recovered correctly on every attempt from spectra and formula alone
On 7 harder targets (fused rings, spirocycles) provided with a starting-material hint, Opus 4.7 returned the correct structure on all three runs for 4 of 7, and on at least two of three runs for the remainder

Limitations acknowledged: The evaluation covers 20 compounds across 4 scaffold classes, uses 1D NMR only, and tests three solvents (DMSO-d₆, CDCl₃, D₂O). 2D experiments, stereochemistry, and complex natural products are explicitly out of scope. Anthropic describes the results as "indicative rather than precise" and notes that a rigorous evaluation would cover several hundred compounds across 20–30 scaffold classes.

Context

Chemistry AI has long promised more than it has delivered in practice. Tools for retrosynthesis and reaction prediction have existed for years, but adoption in academic and small-lab settings has been limited. The data problems are real: chemical datasets are sparse on null results, inconsistent in format, and largely paywalled.

What has changed is the multimodal reasoning capability of frontier models. Claude can read a chemical structure from a hand sketch or journal figure, work through spectral assignments step by step in auditable form, and handle the range of representations — SMILES strings, NMR peak lists, structural drawings — that chemists actually use. CAS, the largest chemistry registry, catalogs over 290 million disclosed substances and adds roughly 15,000 new ones per day — a scale where AI assistance in translation and lookup has practical value.

Why it matters

The most significant result is inverse prediction. Dedicated structure-elucidation software has existed for decades, but it typically requires 2D NMR data, specialized setup, and licensed tools. Claude recovers structures from the same 1D peak list a chemist would paste into a chat interface, with no setup required. For small labs without specialized software, this makes a previously inaccessible task tractable.

On forward prediction, Opus 4.7 matches or exceeds ChemDraw and MestReNova on average despite no domain-specific training — establishing general frontier models as credible tools in applied chemistry workflows rather than qualitative assistants.

Anthropic is expanding its AI for Science program to support chemistry research and invites interested researchers to contact [email protected].

Corroborating sources

Anthropic
https://www.anthropic.com/research/making-claude-a-chemist
“Ultimately, our claim is a modest one: Claude is starting to meaningfully assist chemists with the daily translation, recall, and integration work that complements their judgment, and we plan to keep extending its helpfulness.”

What's new

Forward prediction (structure → spectrum):

Claude Opus 4.7 achieved an average hydrogen peak error of ±0.079 ppm, placing it within or near the tolerance window more often than any other tool tested

On carbon prediction, Opus 4.7 and MestReNova were "effectively tied, at ±1.37 and ±1.48 ppm"

On NMR peak splitting patterns, all three Claude models predicted sub-peak spacing to within half a hertz approximately 80% of the time, versus 26–35% for ChemDraw and MestReNova

Opus 4.7 was the most consistent across three repeat runs

All 8 simpler targets (single-ring or two-fragment molecules) were recovered correctly on every attempt from spectra and formula alone

On 7 harder targets (fused rings, spirocycles) provided with a starting-material hint, Opus 4.7 returned the correct structure on all three runs for 4 of 7, and on at least two of three runs for the remainder

Context

Why it matters

Anthropic is expanding its AI for Science program to support chemistry research and invites interested researchers to contact [email protected].