OPEN_SOURCE ↗
REDDIT · REDDIT// 28d agoOPENSOURCE RELEASE
GraphZero v0.2 brings zero-copy GNN training via mmap
GraphZero is an open-source C++ graph engine that lets you train Graph Neural Networks on 50GB+ datasets on a 16GB laptop by memory-mapping data directly from NVMe instead of loading it into RAM. Version 0.2 adds Node2Vec biased walks and a FeatureStore, with PyPI install and nanobind-powered zero-copy NumPy/PyTorch interop.
// ANALYSIS
The "OOM wall" in GNN training is a real and underserved problem — PyG and DGL assume you have the RAM, and if you don't, you crash. GraphZero's mmap approach is the right systems-level answer.
- –Zero-copy via mmap means OS page faults do the heavy lifting — only touched 4KB blocks land in RAM, dropping peak usage from 24GB+ to ~5GB on ogbn-papers100M
- –Custom CSR binary format (`.gl`) achieves ~60% compression over raw CSV topology; columnar `.gd` maps directly to PyTorch tensors with zero allocation overhead
- –OpenMP neighbor sampling releases the Python GIL, enabling true parallelism across disk I/O, CPU sampling, and GPU compute
- –Currently scoped to data loading and sampling, not training itself — a deliberate narrow focus that keeps the library composable with any training loop
- –Very early (7 GitHub stars, v0.2), but the roadmap toward dynamic graph updates and ACID multi-process safety signals serious long-term ambition
// TAGS
graphzeroopen-sourcellmdevtooldata-toolsinference
DISCOVERED
28d ago
2026-03-15
PUBLISHED
28d ago
2026-03-15
RELEVANCE
5/ 10
AUTHOR
Important-Trash-4868