Sentinel benchmarks web-page token bloat
Sentinel compares naive HTML-to-text with a structural extraction pipeline across 100 pages in news, ecommerce, docs, social, and SaaS categories. Across the 83 accessible URLs, it cut token volume by 71.5% on average, but the answer-quality results were mixed rather than uniformly better.
The core result is useful: most web pages are still packed with context-window waste, and structure-aware extraction can remove a lot of it without obvious catastrophic loss. The weaker part is also the honest part, because the judge-based AQD signal shows compression and usefulness do not move together cleanly.
- –17/100 pages were blocked by bot defenses, which matters because extraction benchmarks on the open web are partially measuring accessibility policy, not just content quality
- –Category spread is informative: news and ecommerce benefit most, while docs and SaaS are less redundant, and social pages vary widely
- –The LLM-as-judge setup is pragmatic but coarse; one category-level question per page will miss nuanced regressions and may inflate ties
- –The Claude Code compression-layer anecdote is a real caveat for anyone benchmarking inside hosted agent harnesses, but it should be independently verified before being treated as fact
DISCOVERED
2h ago
2026-05-08
PUBLISHED
3h ago
2026-05-08
RELEVANCE
AUTHOR
Glittering_Painting8