6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
The article details the author's journey to create a vector database inspired by Turbopuffer's architecture, using Amazon S3 for storage. It covers design challenges, trade-offs, and incremental improvements made during development, focusing on performance and cost-efficiency.
If you do, here's more
Turbopuffer is a vector database designed around object storage like Amazon S3, which raises questions about traditional database design. The author, curious about the architecture, decides to build a simplified version to explore its trade-offs. Key challenges include managing updates and deletes, as S3 is not inherently optimized for database operations. Latency is a significant concern, with each extra roundtrip to S3 adding about 200 milliseconds, affecting user experience.
The author sets ambitious goals for their project, aiming for a cost-effective solution that handles a billion vectors for around $1,500 a month. They outline an architecture that combines a Write-Ahead Log (WAL) for immediate writes with asynchronous background processes for managing indexes. The initial implementation suffers from slow query times, reaching three seconds due to latency issues and the overhead of listing files in S3. To improve performance, the author caches metadata and uses a Least Frequently Used (LFU) eviction strategy, which reduces query times to around 400 milliseconds.
Handling updates and deletes presents its own challenges. Instead of deleting data from S3 directly, the approach involves writing tombstone records to the WAL for deletes and using patch records for attribute updates. This method ensures that the system remains append-only, which aligns with S3's design. While this simplifies some operations, it complicates others, particularly WAL replay, which slows down as more records accumulate. Overall, the project is an exploration of database architecture using S3, revealing both the potential and limitations of this approach.
Questions about this article
No questions yet.