Dante-2B debuts as bilingual Italian-English LLM
Dante-2B is a 2.1B dense transformer trained from scratch on 300B tokens for native-level Italian fluency. The model uses a custom 64K BPE tokenizer and is extending its context window ahead of an open-source release.
Training a 2.1B model from scratch for a specific language pair on consumer hardware demonstrates that architectural focus can outperform generic fine-tuning. The custom tokenizer reduces overhead for Italian text by handling morphology correctly, while from-scratch training avoids "translationese" artifacts. Using specialized corpora like FineWeb-2 IT ensures linguistic nuance often lost in large multilingual models.
DISCOVERED
53d ago
2026-04-06
PUBLISHED
53d ago
2026-04-05
RELEVANCE
AUTHOR
angeletti89
