OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoMODEL RELEASE
Developer trains 90M parameter embedding model from scratch
A developer has open-sourced Rocky-Embed, a 90M parameter custom Transformer embedding model trained entirely from scratch on Google Colab. The model maps text to 1024-dimensional vectors using knowledge distillation from a larger Cohere teacher model.
// ANALYSIS
While not SOTA quality, Rocky-Embed is a compelling proof-of-concept for the democratization of AI training, showing individual developers can still build and distill custom architectures from scratch on consumer platforms.
- –Trained using a custom architecture featuring RoPE, RMSNorm, and QK Normalization, rather than relying on standard BERT-like structures
- –Uses direct MSE distillation from Cohere's multilingual Wikipedia embeddings instead of more computationally expensive contrastive learning
- –Achieves a 0.5453 Spearman correlation on the STS benchmark after 50k training steps
- –Highlights the viability of Colab Pro for small-scale model experimentation, even after overcoming initial exploding gradient issues
// TAGS
rocky-embedembeddingopen-weights
DISCOVERED
3d ago
2026-04-08
PUBLISHED
3d ago
2026-04-08
RELEVANCE
6/ 10
AUTHOR
ConfectionAfter2366