Dual-engine AI music detector survives MP3
A Reddit project pairs a ResNet18 mel-spectrogram classifier with Demucs-based stem separation and reconstruction to spot AI-generated music. It keeps working on MP3, AAC, and OGG, where the CNN alone breaks down, and the author reports about 1.1% human false positives with 80%+ AI detection.
The clever part isn’t just stacking two models, it’s using a cheap confidence gate so the expensive separation pass only runs when the classifier is unsure. That makes the system feel more production-shaped than a single end-to-end detector, even if the edge cases are still messy.
- –Mel-spectrogram CNNs can look strong on WAV and then lose the signal once lossy compression strips the artifacts they learned.
- –Demucs adds a different hypothesis: human recordings leak across stems, while fully synthetic tracks tend to reconstruct too cleanly after separation and remixing.
- –The compute tradeoff is sensible, because source separation is costly and shouldn’t run on every track if the CNN already has high confidence.
- –The biggest risk is generalization: different generators, mastering chains, and Demucs nondeterminism can all move borderline samples around.
DISCOVERED
60d ago
2026-03-28
PUBLISHED
62d ago
2026-03-27
RELEVANCE
AUTHOR
Leather_Lobster_2558