SQL Server Data Services
Data storage in the cloud.
Provisioning Internet-facing data access is vital to the success of the Web-based services hosted by Amazon.com Inc., Google Inc. and Microsoft. Renting cloud-based data services on a pay-as-you-go basis eliminates capital investment in extremely costly on-premises data centers and the recurring personnel costs for database maintenance, backup and restoration. The major players in this fledgling utility data-store market already own large-scale, globally dispersed data centers or, in Microsoft's case, are currently in the build-out process. Adding incremental capacity to handle on-demand customers' traffic surges is far less costly than constructing data centers from ground zero. Data store provisioning is automated, and clustering plus replication or its equivalent reduces maintenance to infrequent server replacement.
SQL Server Data Services (SSDS), which the SQL Server team announced in March, is one of several business-oriented Microsoft Online Services in the beta-testing stage. Like the other two queryable, hosted-data stores in beta-test today -- Amazon SimpleDB and Google App Engine (GAE) -- SSDS doesn't expose the related tables, columns and rows of conventional relational databases. Instead, all three data stores employ varying flavors of the entity-attribute-value (EAV) model. What's more, SSDS and SimpleDB don't support schemas -- though Microsoft has announced plans to support schemas in a future SSDS update. Entities have a rough correspondence to relational tables, attributes to columns, items to rows and values to cells. Microsoft uses the terms "flexible entities" and "flexible property bags" to distinguish SSDS constructs from conventional entity-relationship (E-R) terminology.
The EAV model simplifies entity metadata and lets developers create new entities without resorting to SQL's Data Definition Language (DDL). Another feature of the EAV model is the ability to add attributes on the fly, which is highly valuable for storage of medical records, drug tests and other semi-structured data with schemas that are difficult to predict accurately. SSDS has far fewer data types than SQL Server 2005, which is the service's database engine; SSDS offers only string, decimal, datetime, boolean and base64Binary scalar types. SSDS entities require a unique Id attribute value to identify an item, an optional Kind string attribute to specify its usage and a timestamp Version attribute. SSDS has no notion of JOINs, foreign keys or other relational niceties. But SSDS does implement concurrency-conflict detection by conditional updates, based on a combination of the Version attribute value and If-Match or If-None-Match headers for Representational State Transfer (REST) PUT operations, or ETags with the VersionMatch property of a SOAP message.
Maximum size of a conventional SSDS entity item is 2MB, which corresponds with the maximum length of a request/response message. Query responses of more than 500 items require paging. There's no limit to the number of attributes, provided that the total length of all attributes doesn't exceed 2MB. Blob is a recently introduced entity type that accepts streams of valid Multipurpose Internet Mail Extensions (MIME) content-types up to 100MB in size (in the current version at press time). SSDS automatically indexes scalar data types and will provide full-text search (FTS) shortly; the back-end currently supports FTS and the SSDS team is working on the FTS Web query interface. An ascending sort on Id (key) values is automatic, but SSDS doesn't support ORDER BY operations on other entities.
SSDS consists of three data-access layers over racks of commodity servers running clusters of custom SQL Server 2005 instances -- authorities, containers and entities, which the SSDS team abbreviates as ACE. Containers store one or more Entity sets and are the basic unit of data consistency and the boundary for queries. Authorities are units of service address, accounting, geo-scale and geo-location at a specific Microsoft data center. An authority corresponds approximately to a shared, clustered SQL Server instance that stores multiple data copies for reliability; data replicated between authorities at different geo-locations provides disaster protection. Containers store one or more Entity sets; they are the basic unit of data consistency and the boundary for queries. Future upgrades will support asynchronous fan-out queries over multiple containers. Containers have a maximum size of 2GB for conventional entities, but don't have a size limit for entities containing blobs.
REST and SOAP
SSDS is the most enterprise-oriented of the three primary cloud data stores, though SSDS doesn't implement role-based security yet. The SSDS team currently upgrades the back-end services with bimonthly "sprints." Sprint 3 rolled out on July 24, so Sprint 4 -- implementing features needed for the Professional Developers Conference 2008 -- should be underway at the end of September. The team released the SSDS software development kit, which consists of command-line and Windows-form client tools, in August.