Manu: A Cloud Native Vector Data Management System
June 28, 2022
Achieving Horizontal Scalability for 1 Billion+ Vector Collections: A Technical Deep Dive
Introduction to building a Cloud Native Vector Database
In the era of learning-based embedding models, embedding vectors has become paramount for analyzing and searching unstructured data. As developers leverage popular vector search indexes with their existing data stores, challenges arise when dealing with vector collections that exceed the billion-scale mark. Billion-scale collections necessitate the development of fully managed and horizontally scalable vector databases.
Technical Paper Overview:
This technical paper delves into the intricacies of our design philosophy during the development of Manu, also known as Milvus—the open-source vector database designed for the cloud-native environment. Manu addresses the scalability requirements inherent in managing collections that extend into the tens of billions of vectors. This achievement results from extensive dialogue with over 1,700 industry users, providing valuable insights into real-world use cases and challenges.
Key Focus Areas:
- Scalability: Manu is meticulously engineered to meet the demands of large-scale vector collections, ensuring efficient data handling in the order of tens of billions of vectors.
- Vision for Next-Generation Vector Databases: The paper outlines a visionary roadmap for the future of vector databases, emphasizing crucial features such as long-term evolvability, tunable consistency, good elasticity, and high-performance scalability.
As we navigate the complexities of managing colossal vector collections, the development and insights shared in this paper contribute to the ongoing dialogue surrounding the evolution of vector databases. By addressing challenges and envisioning the features of next-generation databases, we aim to propel the field forward and empower developers in the ever-expanding landscape of data analysis and search.