News
Veeam Uses Azure Cosmos DB to Power Planet-Scale Data Discovery
Veeam described how it rearchitected its data platform to move beyond traditional backup and recovery, using Azure Cosmos DB as the foundation for an AI-powered semantic search system.
In a co-authored post on Microsoft's DevBlogs site, Veeam engineers explained how the new architecture enables customers to locate files, messages, and documents across massive backup datasets in seconds.
Before this redesign, Veeam relied primarily on Azure Blob Storage for scale and durability. While effective for storage, it was not designed for deep or fast search across billions of records. To address this limitation, Veeam built a new architecture around Azure Cosmos DB, pairing it with Azure Databricks, Azure Event Hubs, and Azure OpenAI Service to support real-time ingestion, transformation, and semantic query capabilities.
Azure Databricks pipelines process and transform incoming data at large scale, while Azure Event Hubs manages continuous ingestion streams. Azure OpenAI Service generates vector embeddings that allow the system to understand the context and meaning of queries rather than matching keywords alone. All processed data is stored and indexed in Azure Cosmos DB, which provides global distribution and low-latency access.
Veeam said it designed a hierarchical partitioning strategy to distribute data evenly across Cosmos DB partitions, preventing performance bottlenecks and maintaining throughput within the platform's 20GB-per-partition limit. The schema combines customer identifiers, data type, and user information to optimize lookup performance while supporting multi-tenant scalability.
The result is a hybrid semantic search engine that interprets user intent, blending metadata filters with vector-based similarity search. This allows Veeam customers to perform fast, AI-driven discovery across billions of backup items, turning what was once cold storage into an intelligent, queryable data source.
The post noted that the implementation demonstrates how managed cloud services can simplify scalability for complex workloads while maintaining global performance and compliance. The system supports the company's Data Cloud platform, expanding its capabilities from backup to data discovery.
About the Author
David Ramel is an editor and writer at Converge 360.