Get search done with PostgreSQL
A few words of wisdom…
A search feature is now a part of almost every website regardless of its size or purpose.
If you’re a Django developer, then you’re lucky to have django-haystack at your disposal to hook up any of its available search engines, such as ElasticSearch or Solr. There’s also a library called ElasticUtils, for dealing exclusively with the ElasticSearch engine (if you wish to have more control over your searching power).
But what if you only need to plug in simple search functionality to your awesome new blogging website?
Well, if you’re using Django together with postgresql as your database engine (which you should!) you can actually use some postgresql built-in features to get search running quickly.
I’m about to show you how to set it up in a few simple steps. Also, I will add a quick tip on how to add autocomplete super-powers.
First things first, you need a small 3rd party app called djorm-ext-pgfulltext
pip install djorm-ext-pgfulltext
Make sure you have the proper indexes applied to your database:
CREATE EXTENSION unaccent; ALTER FUNCTION unaccent(text) IMMUTABLE;
Next, let’s assume you have simple blog post model and you want to be able to search by its
from django.db import models from djorm_pgfulltext.models import SearchManager from djorm_pgfulltext.fields import VectorField class BlogPost(models.Model): title = models.CharField(max_length=255, default="") content = models.TextField(default="") search_index = VectorField() objects = SearchManager( fields = ('content',), config = 'pg_catalog.english', search_field = 'search_index', auto_update_search_field = True )
This setup will automatically update the index field when the object is saved.
If you already had some data, you would need to run:
./manage.py update_search_field [appname] to rebuild the index fields.
…and we’re done with the setup part.
Now let’s see how we go about searching through the content.
For testing purposes I’ve added ~50k objects with some random text data.
Let’s say we want to search for either
text2 keywords. Here’s how we do it:
word_list = ['text1', 'text2'] query = " | ".join(q) qs = BlogPost.objects.search(query, rank_field="rank", raw=True)[:15]
The library includes custom manager method called
search, which takes as arguments:
- query string
- rank field to be included in the results (so the search results will be ordered by highest rank value)
rawoption to indicate usage of
to_tsquery is more powerful since it allows us to define our own searching query. You can read more about
to_tsquery function and ranks here.
While testing on 50k rows, I got query times of ~130ms, which in searching standards is not really that great and it’s also not very good considering search widget responsiveness.
So how can we make a it faster?
When looking at the database indexes we notice the
search_index field has a
btree index type:
"search_blogpost_a71a185f" btree (search_index)
which is not that great when you’re dealing with parsing the search index field.
If you look through the documentation, you’ll find out there are actually better index types that we can use, namely
GIN seemed to have the best results in my case. After adding it:
"gin_idx" gin (search_index)
..I got an almost 3x increase in query speed-up on the same search (~ 53ms). Obviously this will cause slower object entries, but it’s a small price to pay. Actually the real performance killer here is the additional rank field, since the
to_rank function needs to go over all
tsvector summaries of all returned documents, thereby slowing down the query.
Nevertheless, we’ve got pretty decent speeds for such a large amount of data, and it’s only by using postgresql goodies!
But wait! There’s more…
There’s a chance you’ll need to add autocomplete to your search feature. Standard keyword searching won’t really cut it anymore. Here’s where the raw search queries come to the rescue:
q = 'search phrase' word_list = map(lambda word: word + ':*', tokenize(q)) query = " | ".join(word_list) qs = BlogPost.objects.search(query, rank_field="rank", raw=True)[:15]
tokenize function is taken from this snippet. You can see a brief explanation there of what it does as well.
So there ya go! All that searching functionality just by using postgresql power! All done within a couple lines of code, together with some database enhancements.