News

Veeam Uses Azure Cosmos DB to Power Planet-Scale Data Discovery

Veeam described how it rearchitected its data platform to move beyond traditional backup and recovery, using Azure Cosmos DB as the foundation for an AI-powered semantic search system.

In a co-authored post on Microsoft's DevBlogs site, Veeam engineers explained how the new architecture enables customers to locate files, messages, and documents across massive backup datasets in seconds.

Before this redesign, Veeam relied primarily on Azure Blob Storage for scale and durability. While effective for storage, it was not designed for deep or fast search across billions of records. To address this limitation, Veeam built a new architecture around Azure Cosmos DB, pairing it with Azure Databricks, Azure Event Hubs, and Azure OpenAI Service to support real-time ingestion, transformation, and semantic query capabilities.

Azure Databricks pipelines process and transform incoming data at large scale, while Azure Event Hubs manages continuous ingestion streams. Azure OpenAI Service generates vector embeddings that allow the system to understand the context and meaning of queries rather than matching keywords alone. All processed data is stored and indexed in Azure Cosmos DB, which provides global distribution and low-latency access.

Veeam said it designed a hierarchical partitioning strategy to distribute data evenly across Cosmos DB partitions, preventing performance bottlenecks and maintaining throughput within the platform's 20GB-per-partition limit. The schema combines customer identifiers, data type, and user information to optimize lookup performance while supporting multi-tenant scalability.

The result is a hybrid semantic search engine that interprets user intent, blending metadata filters with vector-based similarity search. This allows Veeam customers to perform fast, AI-driven discovery across billions of backup items, turning what was once cold storage into an intelligent, queryable data source.

The post noted that the implementation demonstrates how managed cloud services can simplify scalability for complex workloads while maintaining global performance and compliance. The system supports the company's Data Cloud platform, expanding its capabilities from backup to data discovery.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Microsoft Highlights Visual Studio Live! Event Lineup and Longtime Developer Community Role

    A Microsoft MVP Blog post on Visual Studio Live!'s longevity arrives as the 2026 conference series continues with upcoming stops at Microsoft HQ, San Diego and Orlando.

  • Using Local AI to Cut Copilot Usage-Based Billing Shock

    After being gobsmacked by the new billing plan using almost all my monthly credits in one or two days, I tried pushing some Copilot-style coding work onto local models in VS Code. What I found was less "free AI" and more "pick your pain": cloud charges on one side, heavy local resource use and long waits on the other.

  • .NET 11 Preview 5 Focuses on Performance, Productivity and Safer Code

    .NET 11 Preview 5 focuses on under-the-hood runtime performance gains, streamlined APIs and language features that reduce boilerplate, plus built‑in security checks and incremental ASP.NET Core and EF Core improvements aimed at everyday developer productivity.

  • VS Code 1.124 Focuses on Agent Autonomy and Parallel Sessions

    Microsoft's June 2026 VS Code update turns on Autopilot by default and adds background sending for agent sessions.

Subscribe on YouTube