OPEN_SOURCE ↗
REDDIT · REDDIT// 25d agoDISCUSSION
TrOCR Users Probe Multilingual Decoder Swaps
A Reddit user asks whether TrOCR's English-centric decoder can be swapped for a multilingual autoregressive decoder to handle Hindi handwriting. The question is technically pointed: TrOCR is an image Transformer encoder plus text Transformer decoder, so any replacement has to preserve cross-attention and generation.
// ANALYSIS
The core instinct is right, but the easy answer is not plug-and-play - the decoder/tokenizer contract is doing a lot of work here.
- –TrOCR's decoder is autoregressive and cross-attentive, so the architecture can support sequence generation from image features.
- –mT5 is the closer candidate conceptually, but you would still need to rebuild the text side around its tokenizer and generation setup.
- –MuRIL is not a causal decoder, so it does not satisfy the swap-in-decoder requirement the way a seq2seq model would.
- –For Hindi OCR, the bigger bottleneck is usually script coverage and vocabulary, so a multilingual tokenizer plus fine-tuning often matters more than the exact pretrained decoder.
// TAGS
trocrfine-tuningmultimodalopen-source
DISCOVERED
25d ago
2026-03-18
PUBLISHED
25d ago
2026-03-18
RELEVANCE
8/ 10
AUTHOR
ElectronicHoneydew86