BACK_TO_FEEDAICRIER_2
Developer trains 90M parameter embedding model from scratch
OPEN_SOURCE ↗
REDDIT · REDDIT// 3d agoMODEL RELEASE

Developer trains 90M parameter embedding model from scratch

A developer has open-sourced Rocky-Embed, a 90M parameter custom Transformer embedding model trained entirely from scratch on Google Colab. The model maps text to 1024-dimensional vectors using knowledge distillation from a larger Cohere teacher model.

// ANALYSIS

While not SOTA quality, Rocky-Embed is a compelling proof-of-concept for the democratization of AI training, showing individual developers can still build and distill custom architectures from scratch on consumer platforms.

  • Trained using a custom architecture featuring RoPE, RMSNorm, and QK Normalization, rather than relying on standard BERT-like structures
  • Uses direct MSE distillation from Cohere's multilingual Wikipedia embeddings instead of more computationally expensive contrastive learning
  • Achieves a 0.5453 Spearman correlation on the STS benchmark after 50k training steps
  • Highlights the viability of Colab Pro for small-scale model experimentation, even after overcoming initial exploding gradient issues
// TAGS
rocky-embedembeddingopen-weights

DISCOVERED

3d ago

2026-04-08

PUBLISHED

3d ago

2026-04-08

RELEVANCE

6/ 10

AUTHOR

ConfectionAfter2366