In-Depth
Parsing the BSON Beast
Like JSON, only in binary format, BSON is now easier to parse with built-in media type formatters that are included with ASP.NET Web API 2.2 Client Libraries. Here's how.
Binary JSON, or BSON, is a format similar to JSON, but as the name suggests is in a binary format. Developers like to use BSON because it's lightweight with minimal spatial overhead, it's easy to parse, and it means more efficient encoding and decoding. With the release of the Microsoft ASP.NET Web API 2.2 Client Libraries, parsing BSON can be done using the built-in media type formatters.
Before demonstrating the use of the BSON format, I'll first discuss the format. BSON objects consist of an ordered list of elements. Each element contains field type, name and value.
Field names are strings. Types can be any of the following:
- string
- integer (32-bit)
- integer (64-bit)
- double (64-bit)
- date (integer number of milliseconds)
- byte array
- boolean
- null
- BSON object
- BSON array
- Regular expression
- JavaScript code
Because the BSON format allows for storing various data types, there's no need to convert a string to a given type. This accelerates parsing and data retrieval in comparison to JSON or other text-based formats.
To demonstrate BSON, I'll rely on a previous article I wrote as a starting point, "Implementing Binary JSON in ASP.NET Web API 2.1," which shows how to create a Web API service that renders BSON. Moving forward, here I'll modify the project to utilize more data types in BSON. In addition, I'll examine the data structure passed to the client application.
Looking at the Visual Studio solution for the Web API service, I modified the Car.cs file to include two DateTime fields called DateServiced and TimeServiced. These will be used later to illustrate the BSON format of DateTime types. The modified file can be seen in Listing 1.
Listing 1: Complete Listing of Car.cs
using System;
namespace CarInventory21.Models
{
public class Car
{
public Int32 Id { get; set; }
public Int32 Year { get; set; }
public string Make { get; set; }
public string Model { get; set; }
public string Color { get; set; }
public DateTime DateServiced { get; set; } //Newly added field
public DateTime TimeServiced { get; set; } //Newly added field
}
}
In the CarController.cs file, I modified the instantiation of the Cars object to include the newly added fields, as seen here:
Car[] cars = new Car[]
{
new Car { Id = 1, Year = 2012, Make = "Cheverolet", Model = "Corvette", Color ="Red",
DateServiced=Convert.ToDateTime("07/21/2014"), TimeServiced=Convert.ToDateTime("08:32:00") },
new Car { Id = 2, Year = 2011, Make = "Ford", Model = "Mustang GT", Color = "Silver",
DateServiced=Convert.ToDateTime("08/16/2014"), TimeServiced=Convert.ToDateTime("08:33:00") },
new Car { Id = 3, Year = 2008, Make = "Mercedes-Benz", Model = "C300", Color = "Black",
DateServiced=Convert.ToDateTime("06/30/2014"), TimeServiced=Convert.ToDateTime("08:34:00") }
};
Next, I'll ensure the Web API service is configured to send the BSON format. To do this, I'll make sure the BsonMediaTypeFormatter object is being added to the config object (HttpConfiguration type). The complete listing of the WebApiConfig.cs is shown in Listing 2.
Listing 3: The WebApiConfig.cs
using System.Net.Http.Formatting;
using System.Web.Http;
namespace CarInventory21
{
public static class WebApiConfig
{
public static void Register(HttpConfiguration config)
{
// Web API configuration and services
// Web API routes
config.MapHttpAttributeRoutes();
config.Routes.MapHttpRoute(
name: "DefaultApi",
routeTemplate: "api/{controller}/{id}",
defaults: new { id = RouteParameter.Optional }
);
config.Formatters.Clear(); // Remove all other formatters
config.Formatters.Add(new BsonMediaTypeFormatter()); // Enable BSON in the Web service
}
}
}
After all coding changes have been completed, I'll set the index.html to be the start page. If you recall this is performed by simply right-clicking on the page in Solution Explorer and selecting Set As Start Page. The completed CarInventory solution is shown in Figure 1.
Now, when I run the application, index.html will be rendered, as seen in Figure 2.
After I click the /api/car link, the Web API responds by returning a complete listing of the Car[] object in a BSON format. The results are returned to the browser and, hence, prompts me to save or open the file. I'll save the file as car.json and then view it in Visual Studio 2013 so I can examine the binary structure. Looking at the overall structure, you'll see records, or documents as referred to in the BSON specification, are automatically indexed with a 0-based number, as seen in Figure 3.
In addition, each record index is preceded by \0x03, indicating it's an embedded document. This can be seen in Table 1, where all the field designations used in the BSON specification are outlined. The null character (\0x00) is used as a field separator throughout the structure.
As mentioned previously, each field within each record contains information on the type, name and value of each field. The first record, highlighted in Figure 4, shows the first field in the record is "Id" represented by the Hex codes 49 64. Immediately before those bytes is a byte with value \0x10.
Looking at Table 1, the value \0x10 designates a 32-bit integer, which is the data type of the Id field.
Table 1: BSON Data Type Designation
Type Value (Hex) |
Type Description |
\x00 |
BSON Document : init32 refers to the total number of bytes of the document |
e_list ::= element e_list | "" |
Sequence of elements |
\x01 |
Floating point |
\x02 |
UTF-8 string |
\x03 |
Embedded document |
\x04 |
Array |
\x05 |
Binary data |
\x06 |
Deprecated |
\x07 |
(byte*12) ObjectId |
\x08 |
\x00 Boolean "false" |
\x08 |
\x01 Boolean "true" |
\x09 |
int64 UTC milliseconds in Unix epoch |
\x0A |
Null value |
\x0B |
Regular expression |
\x0C |
Deprecated |
\x0D |
JavaScript Code |
\x0E |
Symbol |
\x0F |
JavaScript code w/ scope |
\x10 |
32-bit Integer |
\x11 |
Timestamp. |
\x12 |
64-bit integer |
\xFF |
Min key |
\x7F |
Max key |
e_name ::= cstring |
Key name |
string ::= int32 (byte*) "\x00" |
String |
cstring ::= (byte*) "\x00" |
CString |
binary ::= int32 subtype (byte*) |
Binary |
subtype ::= "\x00" |
Binary / Generic |
subtype ::= "\x01" |
Function |
subtype ::= "\x02" |
Old generic subtype |
subtype ::= "\x03" |
UUID |
subtype ::= "\x05" |
MD5 |
subtype ::= "\x80" |
User defined |
code_w_s ::= int32 string document |
Code w/ scope |
|
The two bytes immediately after "Id" have a value of 1. This is the value of the Id field for that record. This completes the pattern of type-name-value for the Id field. The same cycle repeats for other fields within that record for Year.
Similarly, Make, Model and Color have a similar pattern with a slight difference. They each have a type designation of \0x02 because they're all string values. The values for the string fields are represented by the ASCII character values (that is, "Red" = 52 65 64).
The remaining two fields, DateServiced and TimeServiced are both DateTime types. In the BSON format, they both have a \0x09 designation, indicating they're UTC milliseconds in Unix epoch. This is a time format, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970. Because this time format will produce a large number, the BSON format utilizes a 64-bit integer field and stored the number of seconds in a hex value.
Looking at the code in CarController.cs, the DateServiced for the first record is 07/21/2014. Using an epoch converter, like the one at EpochConverter.com, July 21, 2014 at Midnight EST is equivalent to 1405915200000 milliseconds. This can be seen in Figure 5.
Notice the dropdown box is set to Local Time, which is EST for my location. Using a calculator, 1405915200000 milliseconds converts to 14757137A00 in Hex. However, the binary representation in BSON is reversed, as seen in Figure 6.
This is due to the date being represented in Least Significant Bit First format. You can read more about this at Wikipedia.
With its use of binary data, BSON can be used to transport a variety of data types. Because the data is already in its native format, there's no need to convert strings to integers or other types. This helps to increase performance in parsing data. In addition, the built-in Media formatter within the Web API framework makes data conversion easy with only a few lines of code. These combined features make BSON a powerful and easy format to use.