Tulu prompting cuts contamination to 5%
This paper shows that a carefully structured prompt can get GPT-4o, Gemini 2.0 Flash, and Llama 3.1 70B to generate much cleaner Tulu without any fine-tuning, cutting Kannada vocabulary bleed from 80% to 5% and reaching 85% grammatical accuracy. It is a sharp result for low-resource language work because it treats prompt design itself as the intervention, not model retraining.
The big takeaway is that prompt engineering still has unexplored headroom, especially when the failure mode is distributional collapse into a better-represented neighboring language.
- –The standout insight is the negative-constraint layer: telling the model which Kannada tokens to avoid did more than grammar notes alone
- –The paper is useful beyond Tulu because many low-resource languages face the same asymmetry problem against a dominant linguistic neighbor
- –The custom romanization scheme is an underrated systems detail, shrinking tokenization cost enough to fit richer linguistic scaffolding into context
- –This also exposes a ceiling for prompt-only methods: if synthetic examples and self-critique depend on the same weak prior, scaling quality further may require curated data or fine-tuning
DISCOVERED
77d ago
2026-03-11
PUBLISHED
77d ago
2026-03-11
RELEVANCE
AUTHOR
GrowthExciting1126