REDDIT · REDDIT// 3h agoMODEL RELEASE

Vintage Talkie drops with 1930 cutoff

Talkie is a new 13B parameter language model trained exclusively on 260 billion tokens of text published before 1931. Developed by researchers including Alec Radford, the project aims to create a contamination-free environment for testing generalization and forecasting by simulating a conversational partner from the early 20th century.

// ANALYSIS

Training an LLM entirely on historical data is a brilliant move for clean benchmarking, finally giving researchers a model guaranteed to have zero exposure to modern test sets.

–A hard 1931 cutoff provides a genuinely contamination-free baseline for evaluating reasoning and generalization
–Custom post-training using historical etiquette manuals and cookbooks ensures the model's tone matches its era
–Building a specialized OCR pipeline for noisy historical scans highlights the immense data engineering required for vintage datasets
–The planned expansion to a GPT-3.5 scale model by summer 2026 could establish a new standard for academic evaluation

// TAGS

talkiellmresearchfine-tuningbenchmark

DISCOVERED

3h ago

2026-04-28

PUBLISHED

6h ago

2026-04-28

RELEVANCE

8/ 10

AUTHOR

The_frozen_one