OPEN_SOURCE ↗
REDDIT · REDDIT// 10d agoOPENSOURCE RELEASE
EVōC Challenges UMAP-HDBSCAN on Embedding Clustering
EVōC is a new Python library for clustering high-dimensional embedding vectors, built specifically for use cases like CLIP, sentence-transformer, OpenAI, and Cohere embeddings. It reworks the UMAP + HDBSCAN approach around embedding data, with a scikit-learn-style API, multi-granularity cluster layers, hierarchy extraction, and near-duplicate detection. The project is positioned as an early beta, but the pitch is clear: better clustering results with less tuning and much faster runtime, with performance said to be competitive with MiniBatchKMeans.
// ANALYSIS
Strong niche fit, especially for teams that already cluster embeddings and are tired of babysitting UMAP + HDBSCAN.
- –The differentiation is practical, not flashy: it targets a real pain point in embedding workflows, where dimensionality and compute cost make generic clustering awkward.
- –The scikit-learn-compatible API lowers adoption friction for ML engineers who already have clustering pipelines.
- –Multi-layer clustering and hierarchy support make it more interesting than a plain “faster KMeans alternative,” especially for topic modeling and analysis workflows.
- –The project still reads as early beta, so the claims should be validated on real datasets before treating it as a drop-in replacement.
- –Product Hunt presence does not appear to exist yet, so this is primarily a GitHub/PyPI open-source launch rather than a broader consumer product launch.
// TAGS
clusteringembeddingspythonllmhdbscanumapscikit-learntopic-modeling
DISCOVERED
10d ago
2026-04-01
PUBLISHED
11d ago
2026-04-01
RELEVANCE
8/ 10
AUTHOR
lmcinnes