Modern C++

SQLite Queries and Primary Keys

Learn about SQLite database engine internals so you can simplify your code and improve the performance of your applications.

More on This Topic:

In this third and final installment on using SQLite from C++, I want to begin exploring some of the internals of this amazing little database engine. SQLite does things quite differently than most mainstream SQL database engines; it helps to understand what those things are and how they work, so you can make the most of SQLite. If you haven't already done so, I encourage you to read the first two installments of this miniseries. I'll continue to build on the fundamental C++ abstractions I described in part 1 -- where I define the basic C++ abstractions -- and part 2 -- where I describe prepared statements and parameter binding.

Now, before I dive into the internals and distinctive features of the SQLite database engine, I want to wrap up one loose end from last month's column. I left off with a statement class that allowed me to create a prepared statement and bind values to the statement prior to execution. I could then call the step method to execute the prepared statement. An example is shown in Listing 1.

Listing 1: Calling the Step Method to Execute a Prepared Statement
connection c;
c.open();
c.execute("create table Hens ( Id int primary key, Name text )");

statement s;
s.prepare(c, "insert into Hens (Id, Name) values (?, ?)");

s.bind(1, 101);
s.bind(2, "Henrietta");
s.step();

s.reset();

s.bind(1, 102);
s.bind(2, "Rowena");
s.step();

I mentioned last month that the step method, which calls the SQLite sqlite3_step function, may be used both for evaluating non-queries such as this, as well as query statements for which it might be called repeatedly. However, I didn't provide an example of the latter. Having inserted a number of rows into the Hens table, I might want to query the table to see what it contains. Given the step method I wrote for last month's column returns true each time a new row of data is ready for processing, I can simply call the step method in a loop, as follows:

s.prepare(c, "select Id, Name from Hens ");

while (s.step())
{
  // Retrieve column values here
}

The question now is how to retrieve the values from the result set as the loop iterates over the rows. Again, this is surprisingly simple -- I can add a set of methods to get these column values. SQLite provides the sqlite3_column_int function that returns the value of a single column for the current result set for a particular statement object. I can then add another handy method to my statement class as follows:

auto get_int(int const column = 0)
{
  return sqlite3_column_int(handle.get(), column);
}

The sqlite3_column_int function simply returns a copy of the value from the given column, so this truly is quite simple. I should point out that the columns are indexed starting with zero rather than one, as is the case when binding values for a prepared statement. What about string columns? Well, fortunately, that's just as simple, thanks to the way SQLite deals with memory. The sqlite3_column_text function returns a pointer to a block of memory owned by the statement. This string will always be null terminated. Here's another wrapper method for my statement class:

auto get_string(int const column = 0)
{
  return reinterpret_cast<char const *>(sqlite3_column_text(handle.get(), column));
}

You need not, and certainly must not, free this memory. Just keep in mind that it's only valid while the statement continues to point to the same row of the result set. If you need to retain a copy of the value, then simply make a copy for yourself with a standard string class or any other type that's suitable. I can now complete the body of my query's while loop, as follows:

s.prepare(c, "select Id, Name from Hens");

while (s.step())
{
  printf("Id=%d Name=%s\n",
         s.get_int(0),
         s.get_string(1));
}

OK, so I now have most of my SQL bases covered, so to speak, with simple C++ constructs. I'm going to switch gears and talk a little about the internals of the SQLite engine. In the running example using the Hens table, I've use an integer primary key:

create table Hens ( Id int primary key, Name text )

What might not be so obvious is that all rows within SQLite tables include a 64-bit signed integer key, uniquely identifying the row within its table. This value is called "rowid" and is generally the fastest way to identify a row or range within a table. It trumps even the primary key that I've defined. I could, for example, update the previous example of selecting the Hens table and request both the Id and the rowid, as follows:

s.prepare(c, "select Id, rowid, Name from Hens");

This formal Id column will not end up as a primary key in the same way you might think of a primary key in mainstream SQL database engines. In Microsoft SQL Server, for example, a primary key usually equates to a clustered index. In SQLite, a primary key is usually just a unique index and is distinct from the special rowid column name. Given that the rowid is a 64-bit value, I can add another method to my statement class to return 64-bit integer values:

auto get_int64(int const column = 0)
{
  return sqlite3_column_int64(handle.get(), column);
}

I can then update the previous while loop to print out not only the Id, but also the rowid for each row:

while (s.step())
{
  printf("Id=%d, rowid=%I64d, Name=%s\n",
         s.get_int(0),
         s.get_int64(1),
         s.get_string(2));
}

When I run this loop, the following is printed out:

Id=101, rowid=1, Name=Henrietta
Id=102, rowid=2, Name=Rowena

As you can clearly see, the SQLite database engine automatically assigns unique rowid values distinct from my table's Id primary key, which isn't really the primary key.

There are, of course, a few exceptions. SQLite will assume an explicitly named integer primary key is just an alias for the rowid if the type is spelled out as "INTEGER" rather than "INT." I can simply change my create table statement as follows:

create table Hens ( Id integer primary key, Name text )

Notice the only change is in the type name for the Id column. Now, if I re-run the query I get the following results:

Id=101, rowid=101, Name=Henrietta
Id=102, rowid=102, Name=Rowena

In this case, it's clear the Id and rowid are synonyms. Of course, you can always omit the Id "primary key" all together and just rely on the rowid:

create table Hens ( Name text )

Although it appears as if the table has no primary or unique key, the rowid continues to play the part and lookup performance remains optimal. In fact, given that SQLite will happily generate the rowid values, we can simplify our insertion statements like so:

statement s;
s.prepare(c, "insert into Hens (Name) values (?)");
s.bind(1, "Henrietta");
s.step();

s.reset();

s.bind(1, "Rowena");
s.step();

Here, I'm only binding the Name column and SQLite will generate the rowid for me. Of course, you might actually need to know what value was used for a particular insertion. Fortunately, SQLite provides the sqlite3_last_insert_rowid function that returns the rowid from the most recent insertion on the given connection. I can then simply add a handy rowid method to my connection class, as follows:

auto rowid()
{
    return sqlite3_last_insert_rowid(handle.get());
}

Using it is equally simple: I can call rowid after any insertion to retrieve the newly inserted row's unique rowid:

statement s;
s.prepare(c, "insert into Hens (Name) values (?)");
s.bind(1, "Henrietta");
s.step();

auto id = c.rowid();

There might be rare cases where this automatically generated rowid isn't optimal. I might have a table with a completely different primary key, perhaps a unique text column, and I don't want the overhead of storing and maintaining the rowid integer index. In that case, the WITHOUT ROWID clause may be added to the table creation statement:

create table Hens ( Name text primary key ) without rowid

SQLite does, however, require a primary key. So if I ask it to create a table without a rowid, I must specify a primary key as I've done here. It's unlikely that you can beat the performance of rowid with integer primary keys, so you should only use this option if you have a non-integer primary key and determined that it actually improves your performance or space requirements.

And that's all I have time for this month. With a little insider knowledge about how the SQLite database engine works, you can simplify your code and improve the performance of your applications. Join me next month, as I start a new series about building compilers.

About the Author

Kenny Kerr is a computer programmer based in Canada, an author for Pluralsight and a Microsoft MVP. He blogs at kennykerr.ca and you can follow him on Twitter at twitter.com/kennykerr.

comments powered by Disqus

Featured

Subscribe on YouTube