BACK_TO_FEEDAICRIER_2
quran-semantic-search open-sources Chinese Quran RAG corpus
OPEN_SOURCE ↗
REDDIT · REDDIT// 2h agoOPENSOURCE RELEASE

quran-semantic-search open-sources Chinese Quran RAG corpus

This open-source repo packages five parallel Chinese translations of the Quran into ShareGPT-style and Alpaca-ready JSONL, plus a static semantic search UI. It is aimed at RAG, alignment, and fine-tuning workflows where localized ground truth is scarce.

// ANALYSIS

The core value here is not the UI, it’s the curated parallel corpus: a narrow, high-consistency dataset that is actually useful for retrieval, comparison, and instruction tuning in Chinese. That makes it more interesting as a domain data asset than as a general-purpose app.

  • Five aligned translations give you a built-in comparison set for semantic retrieval and answer-grounding experiments
  • ShareGPT and Alpaca exports reduce friction for LLaMA-Factory, Axolotl, and similar training stacks
  • The static SSR front end is a nice demo layer, but the dataset is the durable part of the release
  • This is strong niche infrastructure for multilingual RAG, but it is not broad enough to matter outside religious/textual domains
  • Licensing and provenance matter here; translation corpora can be useful without being free of copyright or attribution constraints
// TAGS
quran-semantic-searchragfine-tuningsearchopen-sourcedata-tools

DISCOVERED

2h ago

2026-04-20

PUBLISHED

4h ago

2026-04-20

RELEVANCE

7/ 10

AUTHOR

Omerpeace