Wednesday, November 20, 2013

Integrating SQLAlchemy into Django

In my previous post I describe some of the limitations that I have hit while using Django's ORM for our RESTful API here at HireVue.  In this post I will describe how I integrated SQLAlchemy into our Django app for read-only (GET) requests, to handle the queries that Django doesn't allow.

Let's say we are running with a single database instance, so our django settings look something like

DATABASES {
    'default' : {
        'ENGINE' : 'django.db.backends.postgresql_psycopg2',
        'NAME' : 'mydatabase',
        'USER' : 'mydatabaseuser',
        'PASSWORD' : 'mypassword',
        'HOST' : '127.0.0.1',
        'PORT' : '5432',
    }
}

We will continue to let Django handle all of the connection management, so when SQLAlchemy needs a database connection, we want to just use the current connection.  The method to get this connection looks like

# custom connection factory, so we can share with django
def get_conn():
    from django.db import connections
    conn = connections['default']
    return conn.connection

Now we want SQLAlchemy to call get_conn whenever it needs a new connection.  In addition, we need to keep SQLAlchemy from trying to pool the connection, clean up after itself, etc.  We essentially need it to do absolutely no connection handling.  To accomplish this, we create the SQLAlchemy engine with a custom connection pool that looks like

# custom connection pool that doesn't close connections, and uses our
# custom connection factory
class SharingPool(NullPool):
    def __init__(self, *args, **kwargs):
        NullPool.__init__(self, get_conn, reset_on_return=False,
                          *args, **kwargs)

    def status(self):
        return 'Sharing Pool'

    def _do_return_conn(self, conn):
        pass

    def _do_get(self):
        return self._create_connection()

    def _close_connection(self, connection):
        pass

    def recreate(self):
        return self.__class__(self._creator,
                              recycle=self._recycle,
                              echo=self.echo,
                              logging_name=self._orig_logging_name,
                              use_threadlocal=self._use_threadlocal,
                              reset_on_return=False,
                              _dispatch=self.dispatch,
                              _dialect=self._dialect)

    def dispose(self):
        pass

The magic is where we pass get_conn to the NullPool constructor, and then we override most of the other methods to do nothing.  This allows SQLAlchemy to borrow the connection that Django is managing, without interfering.

Now we can create an engine with this pool like so:

# create engine using our custom pool and connection creation logic
engine = create_engine(db_url, poolclass=SharingPool)

Since we want to keep DRY, we use Django's settings to create the db_url that is passed to the engine constructor:

db_url = 'postgresql+psycopg2://{0}:{1}@{2}:{3}/{4}'.format(
            settings.DATABASES['default']['USER'],
            settings.DATABASES['default']['PASSWORD'],
            settings.DATABASES['default']['HOST'],
            settings.DATABASES['default']['PORT'],
            settings.DATABASES['default']['NAME'])

We also don't want to maintain table definitions separate from our Django models, so we use SQLAlchemy reflection to build the metadata:

# inspect the db metadata to build tables
meta = MetaData(bind=engine)
meta.reflect()

This is basically it!  Now you can get the tables you would like to query from the meta object, and create and run selects.  They will run on the connection provided by Django.  Don't try to do any transaction handling or it may mess up Django.  

Also, you'll want to make sure you reflect() on startup, so that tables are ready to go when you need them.