Monday, October 28, 2013

Limitations of Django's ORM

I'm currently working at HireVue, where we have a public RESTful API that we consume for our website. I have done quite a bit of work on the API, and in the process, have run into some of the limitations of the Django ORM.

Difficulty controlling joins

Let's say I have an Interview table for storing information about candidate interviews, an Evaluation table for storing interviewer evaluations, and a User table for information on the interviewers/evaluators.  The Evaluation table is a many-to-many connecting Users and Interviews, with evaluation information stored in the table as well.


Now I want to write a query that returns interviews that a specific user has evaluated highly.  I try something like this:

Interview.objects.filter(evaluation__rating__gt=5, evaluation__user__username='specificuser')

I cross my fingers and hope that Django only joins to the evaluation table once, and uses that same join for both filter conditions.  It seems to work in the cases I have tried, but there is no way for me to tell Django for sure that is what I want.

On the other hand, let's say I want to write a query that returns interviews that a specific user has evaluated low, but others have evaluated high.  How do I tell Django I need two different joins?

Interview.objects.filter(evaluation__rating__lt=3, evaluation__user__username='specificuser') \
        .filter(evaluation__rating__gt=5).distinct()

Will this work?  I have had mixed luck, and find myself banging my head against the Django ORM black box, wishing I could tell it more explicitly what I want.  The Django docs are very reassuring on this matter:
 Conditions in subsequent filter() or exclude() calls that refer to the same relation may end up filtering on different linked objects.
I'm also limited if I am trying to do filtering in different parts of my code, on the same relationship.  Let's say I have some 'permissions' functions that apply a filter so that the request can only see interviews that the current user has evaluated, and then other functions to filter based on search criteria, like whether the evaluation was high or low:

query = Interview.objects.all() 
query = apply_permissions(query) # filter(evaluation__user__username='specificuser')
query = apply_search(query) # filter(evaluation__rating__gt=5)

I don't know of any way to tell Django that both of the filters, applied by different methods, should both use the same joined table.  This makes it difficult to break code out into logical components like this.

Inability to force outer joins

This is the bigger problem, in my opinion:  I am unaware of any way to tell Django to use an outer join.  Others seem to agree that this is not possible.

What if I would like to return all interviews, and if the current user has evaluated the interview, I'd like to include the rating.  This is easily done using a LEFT OUTER JOIN in sql.  Django, however, does not allow a query like this, leaving us with extra queries and joins in code.  That may not seem like too big of a pain in most cases, but if you are querying against a large data set, and want to sort by the ratings in that left outer joined table, things get even more difficult.

In my experience, these are my biggest complaints with Django's ORM.  It's frustrating to be held back by limitations in the ORM, when I know that the database can handle the problem easily.  I find myself jumping through hoops, or coding around these limitations way too often, and thought I'd share, for others considering using Django.

I'll follow up this post with another describing how I recently bolted SQLAlchemy onto our existing Django app, to handle the 'tough' queries.

No comments:

Post a Comment