OPEN_SOURCE ↗
LOBSTERS · LOBSTERS// 40d agoNEWS
ACM survey maps why AI-text detection keeps breaking
This Communications of the ACM piece synthesizes the state of LLM-text detection, covering black-box classifiers, white-box watermarking, benchmark datasets, and adversarial attacks. Its core message is that headline accuracy numbers hide fragile real-world performance, especially under paraphrasing attacks, dataset bias, and low-false-positive constraints.
// ANALYSIS
Detection is maturing as a research field, but this survey makes clear that deployment-grade reliability still lags model progress.
- –Black-box detectors can perform well in-domain, yet often overfit artifacts in curated datasets and generalize poorly.
- –White-box watermarking improves provenance tracing but introduces tradeoffs in text quality and can be attacked with adaptive querying.
- –Paraphrasing remains a practical evasion path, showing that many current detectors are brittle against low-cost adversarial edits.
- –The authors argue evaluation should emphasize true-positive rates at very low false-positive rates, not just aggregate AUC/accuracy.
// TAGS
llmresearchsafetyai-codingthe-science-of-detecting-llm-generated-text
DISCOVERED
40d ago
2026-03-03
PUBLISHED
42d ago
2026-02-28
RELEVANCE
8/ 10