OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoOPENSOURCE RELEASE
quran-semantic-search open-sources Chinese Quran RAG corpus
This open-source repo packages five parallel Chinese translations of the Quran into ShareGPT-style and Alpaca-ready JSONL, plus a static semantic search UI. It is aimed at RAG, alignment, and fine-tuning workflows where localized ground truth is scarce.
// ANALYSIS
The core value here is not the UI, it’s the curated parallel corpus: a narrow, high-consistency dataset that is actually useful for retrieval, comparison, and instruction tuning in Chinese. That makes it more interesting as a domain data asset than as a general-purpose app.
- –Five aligned translations give you a built-in comparison set for semantic retrieval and answer-grounding experiments
- –ShareGPT and Alpaca exports reduce friction for LLaMA-Factory, Axolotl, and similar training stacks
- –The static SSR front end is a nice demo layer, but the dataset is the durable part of the release
- –This is strong niche infrastructure for multilingual RAG, but it is not broad enough to matter outside religious/textual domains
- –Licensing and provenance matter here; translation corpora can be useful without being free of copyright or attribution constraints
// TAGS
quran-semantic-searchragfine-tuningsearchopen-sourcedata-tools
DISCOVERED
2h ago
2026-04-20
PUBLISHED
4h ago
2026-04-20
RELEVANCE
7/ 10
AUTHOR
Omerpeace