DiskANN started as a research project in 2018–2019 to address the large gap between vector search algorithms in the literature and the rapidly expanding scale and feature needs in industry.
Our research, with co-authors from MSR, Microsoft product groups, CMU, UMD, MIT, IITH, and UCI, addresses the following problems—many of which push the state of the art by an order of magnitude in one or more directions:
Some of the ideas are surveyed in a recent bulletin [6].
Many of these ideas are implemented in an open-source project [12], and are used widely within Microsoft and industry, and have inspired hardware adaptations. A few examples include:
Along the way, we realized there were few public datasets or benchmarks, so we partnered with other companies and universities to:
The code for this research [12] was forked many times internally and reimplemented externally, which made it hard to manage and develop new algorithms. Further, since the 2023 version of DiskANN [12] was tied to specific points in the storage hierarchy and managed its own index terms, it was hard to integrate into databases, preventing it from being hardened into a highly available and durable vector database.
With this in mind, since 2023 we have rewritten DiskANN in Rust with the following goals:
This allows DiskANN to be plugged into different databases or systems and to inherit the availability and durability of the host database. The host database can choose to operate DiskANN at different memory tiers suited to target cost-performance points. Our new version has been integrated with five (and counting) backends. It can also be connected to memory buffers to compete with FAISS, hnswlib, or the older "monolithic" in-memory DiskANN.
When integrated with Azure Cosmos DB for NoSQL, Microsoft's highly available geo-distributed database, this integration brings vector indexing into operational databases and is competitive with specialized serverless vector databases [7]. See slides from our VLDB 2025 talk here [23].
For a 25-minute overview of the project, see the slides from an overview talk at VLDB 2025 [24].