In-Depth
How I Created a RavenDB Python Client
You might be surprised how easy it was to port this NoSQL database for .NET to the Python language.
- By Idan Haim Shalom
- 09/23/2016
When I started work at Hibernating Rhinos and taking part in developing RavenDB, I was tasked with creating a RavenDB client in Python.
Python is a dynamic and flexible language, and because of that, I could start developing and experimenting with RavenDB much more easily. I could build the REST methods in an instance and get responses from the server in the early stages.
While developing the Python client I always had to follow the RavenDB basic core and never lose its idea. RavenDB is a NoSQL database for .NET.: open source, speed-obsessed, able to do ACID transactions, and supports high availability, replication, among other features. Its flexibility as a document database gives you the ability to hold dynamic values easily. It can save all types that are chosen, change individual or many documents, and add and delete fields as you please. It just makes sense to program in a dynamic language like Python.
The most important thing was to make a client that would be easy to use just like the .NET client. I was able to "Unleash the power of Python" and follow the Python way without losing any functionality in the process.
In C#, types are declared in advance. The first issue I encountered was how to build my methods and how to let Python users know what correct variables to pass to the methods, without previous knowledge in the RavenDB Python client:
public T Load<T>(string id)
So there are two ways to solve it. The first is to use instance () to check every field and if I get "false" I'll raise an exception.
The second is to use duck typing, which is an advantage of Python. (From Wikipedia: "Duck typing is an application of the duck test in type safety. It requires that type checking is deferred to runtime, and is implemented by means of dynamic typing or reflection.")
I decided to use both solutions. I used the first where I didn't get exceptions from duck typing or where I got unexplained ones:
def load(self, key_or_keys, object_type=None, includes=None):
if not key_or_keys:
raise ValueError("None or empty key is invalid")
if includes and not isinstance(includes, list):
includes = [includes]
if isinstance(key_or_keys, list):
return self._multi_load(key_or_keys, object_type, includes)
I use duck typing when I can be sure the user will understand the problem when he gets an exception:
def save_entity(self, key, entity, original_metadata, metadata, document, force_concurrency_check=False):
self._known_missing_ids.discard(key)
if key not in self._entities_by_key:
self._entities_by_key[key] = entity
self._entities_and_metadata[self._entities_by_key[key]] = {
"original_value": document.copy(),
"metadata": metadata,
"original_metadata": original_metadata,
"etag": metadata.get("etag", None),
"key": key,
"force_concurrency_check": force_concurrency_check}
Note here that in document.copy, document needs to be a dict.
RavenDB .NET client makes life easier by using reflection to create the correct object and for every variable that doesn't exist in the document, it will return the default value (for example, strings will get an empty string). You can already see the problem with that in Python:
- How can I really know the default value of the variable if the user can assign different default values?
- How can I make sure that I get all the right fields for the object?
- How can I make sure not to get any exceptions during the initialization?
That takes us to the second issue: Along with the document, RavenDB saves a dict (as metadata) with more information about the document. One of the properties stored inside the metadata is the Raven-Python-Type property that I put in the metadata to help me solve the issue. In that property, I save the class name and its module as the value. Then, I can try to import it when I want to load or query a document ("Raven-Python-Type": "__main__.Foo"); see Figure 1:
def import_class(name):
components = name.split('.')
try:
mod = __import__(components[0])
for comp in components[1:]:
mod = getattr(mod, comp)
return mod
except (ImportError, ValueError):
pass
return None
The next step will be to check and build the class from the document I got from the server, as shown in Listing 1.
Listing 1: Check, Build Foo Class
class Foo(object):
def __init__(self, name, dependencies=None, save_in_version="2.7.9"):
self.name = name
self.dependencies = dependencies
self.save_in_version = save_in_version
{
"name": "PyRavenDB",
"dependencies": [
"pycrypto >= 2.6.1",
"requests >= 2.9.1",
"inflector >= 2.0.11",
"enum >= 0.4.6"
]
}
Note that here, I have the class Foo with the document I get from the server. The Foo class has been modified after the document has been saved to the server.
I have to know what the variables in the class are. I also need to know if I have them in the document and initialize them with the right value. At the end, I need to know the default values of those variables that I couldn't fetch from the document (their class has changed see - save_in_version in class Foo). Unlike in C# where I can know the default value from the type, this line:
args, __, __, defaults = inspect.getargspec(entity.__class__.__init__)
will give me all the variables in the __init__ method and all the default values. Then I'll execute this code for making the match:
if (len(args) - 1) != len(document):
remainder = len(args)
if defaults:
remainder -= len(defaults)
for i in range(1, remainder):
entity_initialize_dict[args[i]] = document.get(args[i], None)
for i in range(remainder, len(args)):
entity_initialize_dict[args[i]] = document.get(args[i], defaults[i - remainder])
else:
entity_initialize_dict = document
entity.__init__(**entity_initialize_dict)
(Get more information on the getargspec function.)
After making the match I can use entity_initialize_dict and pass it to __init__. This action will solve many problems going forward. For example, if our class inherits from another class and it doesn't contain all the fields of the base class in the __init__ method, then the getargspec method won't return them and in this case, I can lose important information about the class (the result is an uncompleted object). The method will return a DynamicStructure if it fails to import the class or in case the object_type variable (will be explained later) is equal to None:
class _DynamicStructure(object):
def __init__(self, **entries):
self.__dict__.update(entries)
def __str__(self):
return str(self.__dict__)
In RavenDB .NET client, there are many usages of generics. RavenDB tries to be as strongly typed as it can be, and I can understand why (no one wants errors):
Foo foo = session.Load<Foo>("foos/1");
In Python, it's a little different. Python doesn't need any of that because of its dynamic structure. I can add and change every value I want in every class I want during runtime. Still, I wanted to add the option to get any type of object or the actual type that's specified in the document metadata.
For that, I added the field object_type (None in default).
In object_type the user can put any class he wants and if the client finds a match against the type specified in the document metadata (Raven-Python-Type), we will get the right class. If we don't initialize object_type and Raven-Python-Type in the metadata we will get a dynamic entity (see _DynamicStructure):
foo = session.load("foos/1", object_type=Foo)
Finally, I could overcome all these issues and create the Python client in RavenDB that can handle most CRUD scenarios, including full support for replication, failover, dynamic queries and so on. Yes, it's that simple.
About the Author
Idan Haim Shalom is a software developer with Hibernating Rhinos, developers of the RavenDB NoSQL database.