Code Review Dataset drops 355k review rows

// 125d agoOPENSOURCE RELEASE

Code Review Dataset drops 355k review rows

Ronan Takizawa released an open Hugging Face dataset of 355,807 code review examples built from 725 permissively licensed GitHub repos across 37 languages, pairing human review comments with before/after code changes plus negative examples where no review comment was needed. The same release also claims a Qwen2.5-Coder-32B fine-tune on the dataset delivered roughly 4x better BLEU-4, ROUGE-L, and SBERT scores than the base model for review-style tasks.

// ANALYSIS

This is the kind of dataset AI coding research has been missing: real reviewer feedback tied to actual code edits instead of synthetic instruction fluff.

–The strongest signal here is the triplet structure: diff context, reviewer comment, and resulting code change all in one row
–Negative examples matter almost as much as positive ones because they teach models when clean code should pass without noisy comments
–Coverage across 725 repos and 37 languages makes it more useful for generalist coding models than single-language benchmark sets
–The permissive-license filtering lowers legal friction for teams experimenting with fine-tuning on real OSS review data
–The reported model gains are promising, but they are still self-reported metrics rather than an independently validated benchmark

// TAGS

code-review-datasetcode-reviewai-codingfine-tuningopen-sourceresearch

DISCOVERED

125d ago

2026-03-09

PUBLISHED

125d ago

2026-03-09

RELEVANCE

8/ 10

AUTHOR

Ok_Employee_6418

// KEEP READING

More AI developer news from the feed

EXPLORE FULL FEED

NEWS14m ago

GPT-5.6 Sol in Claude Code outperforms Codex

Running OpenAI's GPT-5.6 Sol within Anthropic's Claude Code terminal environment reportedly outperforms legacy tools like Codex. The setup highlights the growing shift toward terminal-centric agentic loops for complex software tasks.

MODEL43m ago

Modelers drops Ascend NPU-optimized models

Modelers, the open-source model hub for Huawei's Ascend NPU ecosystem, has released a batch of twelve new fine-tuned model entries focused on hardware-specific efficiency. The release aims to build developer momentum and optimize AI inference for Ascend NPUs, though the impact of individual updates is diluted by the sheer number of simultaneous entries and limited public differentiation.

OPEN SOURCE1h ago

C# PS5 emulator SharpEmu boots 2D games

SharpEmu is an experimental, open-source PlayStation 5 emulator written in C# that targets Windows, Linux, and macOS. In its early development stages, the project has successfully booted simple 2D games like Dreaming Sarah and shown initial progress loading complex titles such as Demon's Souls Remake.