TrOCR Users Probe Multilingual Decoder Swaps
A Reddit user asks whether TrOCR's English-centric decoder can be swapped for a multilingual autoregressive decoder to handle Hindi handwriting. The question is technically pointed: TrOCR is an image Transformer encoder plus text Transformer decoder, so any replacement has to preserve cross-attention and generation.
The core instinct is right, but the easy answer is not plug-and-play - the decoder/tokenizer contract is doing a lot of work here.
- –TrOCR's decoder is autoregressive and cross-attentive, so the architecture can support sequence generation from image features.
- –mT5 is the closer candidate conceptually, but you would still need to rebuild the text side around its tokenizer and generation setup.
- –MuRIL is not a causal decoder, so it does not satisfy the swap-in-decoder requirement the way a seq2seq model would.
- –For Hindi OCR, the bigger bottleneck is usually script coverage and vocabulary, so a multilingual tokenizer plus fine-tuning often matters more than the exact pretrained decoder.
DISCOVERED
72d ago
2026-03-18
PUBLISHED
72d ago
2026-03-18
RELEVANCE
AUTHOR
ElectronicHoneydew86
