OPEN_SOURCE ↗
REDDIT · REDDIT// 3h agoMODEL RELEASE
Vintage Talkie drops with 1930 cutoff
Talkie is a new 13B parameter language model trained exclusively on 260 billion tokens of text published before 1931. Developed by researchers including Alec Radford, the project aims to create a contamination-free environment for testing generalization and forecasting by simulating a conversational partner from the early 20th century.
// ANALYSIS
Training an LLM entirely on historical data is a brilliant move for clean benchmarking, finally giving researchers a model guaranteed to have zero exposure to modern test sets.
- –A hard 1931 cutoff provides a genuinely contamination-free baseline for evaluating reasoning and generalization
- –Custom post-training using historical etiquette manuals and cookbooks ensures the model's tone matches its era
- –Building a specialized OCR pipeline for noisy historical scans highlights the immense data engineering required for vintage datasets
- –The planned expansion to a GPT-3.5 scale model by summer 2026 could establish a new standard for academic evaluation
// TAGS
talkiellmresearchfine-tuningbenchmark
DISCOVERED
3h ago
2026-04-28
PUBLISHED
6h ago
2026-04-28
RELEVANCE
8/ 10
AUTHOR
The_frozen_one