Alconost drops pro-grade MQM gold dataset

// 71d agoOPENSOURCE RELEASE

Alconost drops pro-grade MQM gold dataset

Alconost has open-sourced its MQM-annotated MT evaluation dataset on Hugging Face (https://huggingface.co/datasets/alconost/mqm-translation-gold), with 362 segments across 16 language pairs and annotations from 48 professional linguists. The accompanying announcement (https://www.reddit.com/r/MachineLearning/comments/1rw3a3j/d_releasing_a_professional_mqmannotated_mt/) positions it as a WMT-aligned, higher-agreement alternative to noisier crowdsourced test sets.

// ANALYSIS

This is the kind of small-but-clean dataset release that can punch above its size in MT eval workflows.

–The data includes full MQM structure (category, severity, span) plus multiple annotators per segment, which is unusually useful for agreement and metric-analysis work.
–Reported Kendall’s τ = 0.317 is materially above typical WMT ranges cited by the authors, suggesting annotation process quality was a priority.
–The dataset is better suited for benchmarking, error analysis, and reward-model calibration than for training large translation models due to its scale.
–License is CC BY-SA 4.0 and the card notes it is a growing collection, so it could become a recurring reference set if updates continue.
–If replicated by others, this could raise expectations for professionally annotated open MT eval resources.

// TAGS

mqm-translation-goldmachine-translationllmbenchmarkresearchopen-sourcehuman-evaluation

DISCOVERED

71d ago

2026-03-17

PUBLISHED

71d ago

2026-03-17

RELEVANCE

8/ 10

AUTHOR

ritis88

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE4h ago

Cursor adds dedicated subagents for skills

Cursor now allows developers to execute tool-heavy or research-intensive agent skills within dedicated subagents. This architectural shift isolates noisy background tasks, keeping the main chat context clean and focused.

UPDATE5h ago

YouTube moves AI labels to video player

YouTube is moving its AI content disclosures from video descriptions to more prominent placements beneath the player and on Shorts overlays. Starting in May, the platform will use internal signals to automatically label photorealistic AI content that creators fail to disclose.

OPEN SOURCE8h ago

Taste Skill kills AI "frontend slop"

Taste-Skill is an open-source framework that provides portable "agent skills" to enforce high-end design principles in AI-generated code. By injecting specific design directives and "anti-slop" rules, it enables LLMs to produce editorial-grade UIs that bypass generic, boilerplate-heavy AI templates.