8 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article details how VectorChord reduced the time to index 100 million vectors in PostgreSQL from 40 hours to just 20 minutes while cutting memory usage by seven times. It outlines specific optimizations in the clustering, insertion, and compaction phases that made this significant improvement possible.
If you do, here's more
VectorChord has made significant advancements in indexing 100 million 768-dimensional vectors in just 20 minutes using PostgreSQL. This process previously required about 200 GB of memory and 40 hours with pgvector, making it impractical for large-scale applications. The new method reduces memory usage by seven times and builds indexes significantly faster, allowing deployment on cheaper machines without GPUs.
The article details several optimizations across three phases: initialization, insertion, and compaction. In the initialization phase, hierarchical K-means clustering was improved to speed up the process from 30 minutes on a GPU to just 8 minutes on a CPU. This was achieved by dividing the data into smaller subsets for clustering, which allowed for a more manageable computational load. The insertion phase saw a dramatic decrease in time from 420 minutes to just 9 minutes, while the compaction phase was reduced from 8 minutes to 1 minute.
Memory usage was also tackled by focusing on dimensionality reduction. The article references Christosβs findings that clustering can still be effective on lower-dimensional projections, which helps mitigate the high memory requirements of the original K-means algorithm. Overall, these optimizations enable VectorChord to manage substantial vector datasets efficiently while maintaining a balance between speed and accuracy.
Questions about this article
No questions yet.