OPEN_SOURCE ↗
REDDIT · REDDIT// 11d agoTUTORIAL
OpenMed trains mRNA models across 25 species
OpenMed published a deep technical walkthrough of its protein AI pipeline, from ESMFold and ProteinMPNN through codon optimization. The standout result is CodonRoBERTa-large-v2, which hit perplexity 4.10 and CAI Spearman 0.404 while scaling to 25 species in 55 GPU-hours.
// ANALYSIS
The interesting part here is not just that OpenMed trained another biological language model, but that it proved a classic RoBERTa-style stack beat a more modern transformer on codon data.
- –CodonRoBERTa-large-v2 clearly beat ModernBERT, which suggests biology-specific inductive bias still matters more than the latest NLP architecture trends
- –The jump from low CAI correlation to 0.404 after hyperparameter tuning is the real win, because it moves the model from “predictive” to biologically useful
- –Training 4 production models across 25 species for $165 is a strong signal that specialized biomedical modeling is becoming cheap enough for small teams
- –The species-conditioned setup is the most defensible product angle here, since it turns a single codon model into a multi-organism system
- –The article reads more like a reproducible research notebook than a marketing post, which makes the claims easier to trust
// TAGS
openmedopen-sourceresearchbenchmarkgpullm
DISCOVERED
11d ago
2026-04-01
PUBLISHED
11d ago
2026-03-31
RELEVANCE
7/ 10
AUTHOR
dark-night-rises