Manning launches Evaluation and Alignment book

// 71d agoPRODUCT LAUNCH

Manning launches Evaluation and Alignment book

Manning’s new MEAP by Hanchung Lee collects seminal papers on how to evaluate and align AI systems, moving from BLEU and ROUGE to BERTScore, COMET, LLM-as-a-judge, RLHF, constitutional AI, and red teaming. The r/MachineLearning launch also includes a 50% discount code, MLLEE450RE.

// ANALYSIS

Hot take: this is less a book about metrics and more a reminder that most real ML failures are specification failures.

–It treats evaluation as a design choice, not an afterthought, which is exactly how production teams should think about LLMs.
–The chapter lineup mirrors the field’s evolution: lexical scoring, semantic similarity, judgment-based evaluation, then alignment loops.
–The MEAP format makes it a living resource rather than a finished textbook, which suits a fast-moving topic like LLM evaluation.
–The practitioner focus makes it useful for teams trying to align helpfulness, safety, and consistency around a shared evaluation language.

// TAGS

llmsafetyresearchbenchmarkevaluation-and-alignment

DISCOVERED

71d ago

2026-03-18

PUBLISHED

71d ago

2026-03-18

RELEVANCE

7/ 10

AUTHOR

ManningBooks

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

MODEL2h ago

Anthropic drops Opus 4.8 for Claude Code

Anthropic has released Opus 4.8, integrating the new model into Claude Code with high-effort defaults for complex coding tasks. The update boosts SWE-bench Pro scores to 69.2% and drastically reduces unremarked flaws in generated code.

VIDEO2h ago

Google AI animates cardboard TPUs for I/O 2026

Google AI partners with director Laurie Rowan and Nexus Studios to create a promotional short film for Google I/O 2026. The project leverages AI models to animate physical materials like cardboard and markers into characters representing Tensor Processing Units.

MODEL2h ago

Claude Opus 4.8 drops with extended agentic autonomy

Anthropic has released Claude Opus 4.8, bringing improvements to agentic skills, reasoning, and coding capabilities at the exact same price. The update introduces sharper judgment, increased honesty about its task progress, and the ability to operate autonomously for much longer periods.