Harsha Vardhan Simhadri

Senior Principal Researcher, Microsoft Azure

email LinkedIn page Microsoft Research Wegpage GitHub DBLP

I enjoy developing new algorithms motivated by real-world applications and systems. My PhD thesis developed parallel algorithms and run-times with provable guarantees for multi-core processors. Subsequently, I worked with an amazing team at Microsoft Research India and developing new ML operators and architectures for tiny IoT and edge devices (EdgeML).

In 2018, we started the DiskANN project for Approximate Nearest Neighbor Search (ANNS) to address the large gap between research and practice. We developed the first practical SSD-based ANNS system that can search a billion points in few milliseconds, real-time accurate updates and fast predicated vector queries via Filtered-DiskANN that incorporate vector and predicate data into index construction.

These ideas are widely deployed at scale in Microsoft for web and enterprise document search, advertisements and recommendation systems, Windows Copilot runtime, and have influcenced many vector databases [DataStax Jvector, pgvectorscale, Pinecone Graph Algorithms], and hardware accelerated designs [Intel OptaNNE for pmem, BANG for GPUs].

Recently, I joined Azure Data and have been working on a Rust re-write of DiskANN that inter-operates with databases (e.g., CosmosDB NoSQL, PostgreSQL), key-value stores, and plain old memory buffers and file systems. This project is the foundation for vector indices in Azure Databases such as CosmosDB NoSQL.

Along the way, we realized that there were precious few realistic datasets and benchmarks. So we created and curated new datasets via big-ann-benchmarks for the research community. We organized two competitions based on these datasets at NeurIPS'21 and NeurIPS'23. The first focused on billion-scale indices on standard and specialized hardware, while the second focused on practical variants of vector search such as streaming, sparse and filtered search. We are open to ongoing dataset and algorithm contributions to this effort.

Here is a short overview on DiskANN and a recording from Northwestern IDEAL workshop.

Publications

Thesis
Program-Centric Cost Models for Locality and Parallelism

Students I have worked with
Grace Dinh, Chirag Gupta, Srajan Garg, Don Dennis, Shishir Patil, Suhas Jayaram Subramanya, Abhishek Panigrahi, Saching Goyal, Moksh Jain, Oindrila Saha, Aditi Singh

Teaching Assistant

  • 15-750: Graduate Algorithms (Spring 2011)
  • 15-499: Parallel Algorithms (Spring 2009)

Earlier
2013-2016: Postdoctoral Fellow, CS Department, Lawrence Berkeley National Lab.
2007-2013: Ph.D., CS Department, Carnegie Mellon University, Advisor: Guy Blelloch
2003-2007: B.Tech, IIT Madras, Major: CS, Minor: Physics.