REDDIT · REDDIT// 16d agoMODEL RELEASE

Meta AI unveils TRIBE v2 foundation model

Meta AI’s TRIBE v2 is a tri-modal foundation model that predicts human fMRI brain responses from video, audio, and language. Trained on over 1,000 hours of fMRI across 720 subjects, it pairs the paper with public code, weights, and a demo, and reports several-fold gains over traditional linear encoding models on novel stimuli, tasks, and subjects.

// ANALYSIS

This is less mind-reading hype than a serious attempt to turn neuroscience into a reproducible multimodal benchmark. The most interesting part is that Meta shipped the research stack alongside the paper, so other labs can probe the claims instead of treating them as a one-off result.

–Scale is the moat here: 1,000+ hours of fMRI from 720 subjects is the kind of breadth that can expose whether a model really generalizes.
–The architecture is pragmatic, reusing Llama 3.2, V-JEPA 2, and Wav2Vec-BERT rather than inventing bespoke encoders for every modality.
–The ablations suggest multimodal modeling matters most in high-level associative cortex, while unimodal models still dominate their native sensory territories.
–The in-silico experiments are the real differentiator, because reproducing classic visual and neuro-linguistic results turns TRIBE v2 into a hypothesis-testing tool, not just a benchmark win.
–The paper and code are public, which should make replication and follow-on work easier: https://ai.meta.com/research/publications/a-foundation-model-of-vision-audition-and-language-for-in-silico-neuroscience/ and https://github.com/facebookresearch/tribev2.

// TAGS

tribe-v2multimodalresearchbenchmarkopen-weights

DISCOVERED

16d ago

2026-03-26

PUBLISHED

16d ago

2026-03-26

RELEVANCE

9/ 10

AUTHOR

141_1337