In-Depth

Parsing the BSON Beast

Like JSON, only in binary format, BSON is now easier to parse with built-in media type formatters that are included with ASP.NET Web API 2.2 Client Libraries. Here's how.

Binary JSON, or BSON, is a format similar to JSON, but as the name suggests is in a binary format. Developers like to use BSON because it's lightweight with minimal spatial overhead, it's easy to parse, and it means more efficient encoding and decoding. With the release of the Microsoft ASP.NET Web API 2.2 Client Libraries, parsing BSON can be done using the built-in media type formatters.

Before demonstrating the use of the BSON format, I'll first discuss the format. BSON objects consist of an ordered list of elements. Each element contains field type, name and value.

Field names are strings. Types can be any of the following:

  • string
  • integer (32-bit)
  • integer (64-bit)
  • double (64-bit)
  • date (integer number of milliseconds)
  • byte array
  • boolean
  • null
  • BSON object
  • BSON array
  • Regular expression
  • JavaScript code

Because the BSON format allows for storing various data types, there's no need to convert a string to a given type. This accelerates parsing and data retrieval in comparison to JSON or other text-based formats.

To demonstrate BSON, I'll rely on a previous article I wrote as a starting point, "Implementing Binary JSON in ASP.NET Web API 2.1," which shows how to create a Web API service that renders BSON. Moving forward, here I'll modify the project to utilize more data types in BSON. In addition, I'll examine the data structure passed to the client application.

Looking at the Visual Studio solution for the Web API service, I modified the Car.cs file to include two DateTime fields called DateServiced and TimeServiced. These will be used later to illustrate the BSON format of DateTime types. The modified file can be seen in Listing 1.

Listing 1: Complete Listing of Car.cs
using System;

namespace CarInventory21.Models
{
  public class Car
  {
    public Int32 Id { get; set; }
    public Int32 Year { get; set; }
    public string Make { get; set; }
    public string Model { get; set; }
    public string Color { get; set; }
    public DateTime DateServiced { get; set; }    //Newly added field        
    public DateTime TimeServiced { get; set; }   //Newly added field
  }
}

In the CarController.cs file, I modified the instantiation of the Cars object to include the newly added fields, as seen here:

Car[] cars = new Car[] 
{ 
  new Car { Id = 1, Year = 2012, Make = "Cheverolet", Model = "Corvette", Color ="Red", 
    DateServiced=Convert.ToDateTime("07/21/2014"), TimeServiced=Convert.ToDateTime("08:32:00") }, 
  new Car { Id = 2, Year = 2011, Make = "Ford", Model = "Mustang GT", Color = "Silver", 
    DateServiced=Convert.ToDateTime("08/16/2014"), TimeServiced=Convert.ToDateTime("08:33:00") }, 
  new Car { Id = 3, Year = 2008, Make = "Mercedes-Benz", Model = "C300", Color = "Black", 
    DateServiced=Convert.ToDateTime("06/30/2014"), TimeServiced=Convert.ToDateTime("08:34:00") } 
};

Next, I'll ensure the Web API service is configured to send the BSON format. To do this, I'll make sure the BsonMediaTypeFormatter object is being added to the config object (HttpConfiguration type). The complete listing of the WebApiConfig.cs is shown in Listing 2.

Listing 3: The WebApiConfig.cs
using System.Net.Http.Formatting;
using System.Web.Http;

namespace CarInventory21
{
  public static class WebApiConfig
  {
    public static void Register(HttpConfiguration config)
    {
      // Web API configuration and services

      // Web API routes
      config.MapHttpAttributeRoutes();

      config.Routes.MapHttpRoute(
        name: "DefaultApi",
        routeTemplate: "api/{controller}/{id}",
        defaults: new { id = RouteParameter.Optional }
      );
            
      config.Formatters.Clear();                             // Remove all other formatters
      config.Formatters.Add(new BsonMediaTypeFormatter());   // Enable BSON in the Web service
    }
  }
}

After all coding changes have been completed, I'll set the index.html to be the start page. If you recall this is performed by simply right-clicking on the page in Solution Explorer and selecting Set As Start Page. The completed CarInventory solution is shown in Figure 1.

[Click on image for larger view.] Figure 1. Solution Explorer View of CarInventory21.sln

Now, when I run the application, index.html will be rendered, as seen in Figure 2.

[Click on image for larger view.] Figure 2. View of Index.html

After I click the /api/car link, the Web API responds by returning a complete listing of the Car[] object in a BSON format. The results are returned to the browser and, hence, prompts me to save or open the file. I'll save the file as car.json and then view it in Visual Studio 2013 so I can examine the binary structure. Looking at the overall structure, you'll see records, or documents as referred to in the BSON specification, are automatically indexed with a 0-based number, as seen in Figure 3.

[Click on image for larger view.] Figure 3. BSON Records Containing a 0-Based Index

In addition, each record index is preceded by \0x03, indicating it's an embedded document. This can be seen in Table 1, where all the field designations used in the BSON specification are outlined. The null character (\0x00) is used as a field separator throughout the structure.

As mentioned previously, each field within each record contains information on the type, name and value of each field. The first record, highlighted in Figure 4, shows the first field in the record is "Id" represented by the Hex codes 49 64. Immediately before those bytes is a byte with value \0x10.

[Click on image for larger view.] Figure 4. First Record in the Output

Looking at Table 1, the value \0x10 designates a 32-bit integer, which is the data type of the Id field.

Table 1: BSON Data Type Designation
Type Value (Hex) Type Description
\x00 BSON Document : init32 refers to the total number of bytes of the document
e_list ::= element e_list | "" Sequence of elements
\x01 Floating point
\x02 UTF-8 string
\x03 Embedded document
\x04 Array
\x05 Binary data
\x06 Deprecated
\x07 (byte*12) ObjectId
\x08 \x00 Boolean "false"
\x08 \x01 Boolean "true"
\x09 int64 UTC milliseconds in Unix epoch
\x0A Null value
\x0B Regular expression
\x0C Deprecated
\x0D JavaScript Code
\x0E Symbol
\x0F JavaScript code w/ scope
\x10 32-bit Integer
\x11 Timestamp.
\x12 64-bit integer
\xFF Min key
\x7F Max key
e_name ::= cstring Key name
string ::= int32 (byte*) "\x00" String
cstring ::= (byte*) "\x00" CString
binary ::= int32 subtype (byte*) Binary
subtype ::= "\x00" Binary / Generic
subtype ::= "\x01" Function
subtype ::= "\x02" Old generic subtype
subtype ::= "\x03" UUID
subtype ::= "\x05" MD5
subtype ::= "\x80" User defined
code_w_s ::= int32 string document Code w/ scope

The two bytes immediately after "Id" have a value of 1. This is the value of the Id field for that record. This completes the pattern of type-name-value for the Id field. The same cycle repeats for other fields within that record for Year.

Similarly, Make, Model and Color have a similar pattern with a slight difference. They each have a type designation of \0x02 because they're all string values. The values for the string fields are represented by the ASCII character values (that is, "Red" = 52 65 64).

The remaining two fields, DateServiced and TimeServiced are both DateTime types. In the BSON format, they both have a \0x09 designation, indicating they're UTC milliseconds in Unix epoch. This is a time format, defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970. Because this time format will produce a large number, the BSON format utilizes a 64-bit integer field and stored the number of seconds in a hex value.

Looking at the code in CarController.cs, the DateServiced for the first record is 07/21/2014. Using an epoch converter, like the one at EpochConverter.com, July 21, 2014 at Midnight EST is equivalent to 1405915200000 milliseconds. This can be seen in Figure 5.

[Click on image for larger view.] Figure 5. Screen Caption from Epochconverter.com

Notice the dropdown box is set to Local Time, which is EST for my location. Using a calculator, 1405915200000 milliseconds converts to 14757137A00 in Hex. However, the binary representation in BSON is reversed, as seen in Figure 6.

[Click on image for larger view.] Figure 6. Representation of DateServiced in Binary

This is due to the date being represented in Least Significant Bit First format. You can read more about this at Wikipedia.

With its use of binary data, BSON can be used to transport a variety of data types. Because the data is already in its native format, there's no need to convert strings to integers or other types. This helps to increase performance in parsing data. In addition, the built-in Media formatter within the Web API framework makes data conversion easy with only a few lines of code. These combined features make BSON a powerful and easy format to use.

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube