Reddit thread points to Humanity's Last Exam
A Redditor asks for free, difficult online tests or certifications that can probe AI across coding, cyberdefense, DevOps, and other domains. The lone reply points to Humanity's Last Exam, a broad benchmark built to stress expert-level reasoning rather than credential prep.
This is less a search for a certificate and more a search for an eval harness, and HLE shows how far benchmarks still are from a true skills report for models.
- –HLE spans 2,500 expert-authored questions across 100+ subjects and includes multimodal items, so it is broad and genuinely hard.
- –Its authors frame it as a measure of structured academic capability, not autonomous research or creative problem-solving.
- –Because the benchmark is fixed and closed-ended, it can rank models but not produce the personalized weak-area report the Redditor wants.
- –That leaves room for subject-specific eval products with scoring, explanations, and gap analysis across coding, security, and DevOps.
DISCOVERED
65d ago
2026-03-23
PUBLISHED
65d ago
2026-03-23
RELEVANCE
AUTHOR
unknown-one