Tulu prompting cuts contamination to 5%

// 123d agoRESEARCH PAPER

Tulu prompting cuts contamination to 5%

This paper shows that a carefully structured prompt can get GPT-4o, Gemini 2.0 Flash, and Llama 3.1 70B to generate much cleaner Tulu without any fine-tuning, cutting Kannada vocabulary bleed from 80% to 5% and reaching 85% grammatical accuracy. It is a sharp result for low-resource language work because it treats prompt design itself as the intervention, not model retraining.

// ANALYSIS

The big takeaway is that prompt engineering still has unexplored headroom, especially when the failure mode is distributional collapse into a better-represented neighboring language.

–The standout insight is the negative-constraint layer: telling the model which Kannada tokens to avoid did more than grammar notes alone
–The paper is useful beyond Tulu because many low-resource languages face the same asymmetry problem against a dominant linguistic neighbor
–The custom romanization scheme is an underrated systems detail, shrinking tokenization cost enough to fit richer linguistic scaffolding into context
–This also exposes a ceiling for prompt-only methods: if synthetic examples and self-critique depend on the same weak prior, scaling quality further may require curated data or fine-tuning

// TAGS

making-large-language-models-speak-tulullmprompt-engineeringresearch

DISCOVERED

123d ago

2026-03-11

PUBLISHED

123d ago

2026-03-11

RELEVANCE

7/ 10

AUTHOR

GrowthExciting1126

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

UPDATE1h ago

OpenDesign integrates Meta Muse Spark API

OpenDesign is an open-source, local-first design workspace that can be paired with Meta's Muse Spark to generate code-ready prototypes and UI screens directly from screenshots and prompts. This integration bridges the gap between visual design and software development, providing developers with an interactive workspace to rapidly iterate on AI-generated user interfaces.

UPDATE1h ago

T3 Code updates agent GUI with git worktrees

T3 Code has updated its local-first GUI for orchestrating AI coding agents, adding multi-provider key and subscription management. The release also introduces native support for git worktrees, custom automation actions, and side-by-side split diffs to safely run multiple agent workflows in parallel.

UPDATE2h ago

Grok Build adds multiline input, scrolling

SpaceXAI has released Grok Build versions 0.2.99 and 0.2.98, introducing multiline input and terminal scrolling for its terminal-based AI coding assistant. The updates allow users to input complex prompts directly on the dashboard and scroll through chat histories using PageUp and PageDown.