Eugene's Blog

I can't believe it's blog!

Code: RSS in Django

Update: "The Simple Way" part of this tutorial is obsolete now. I am going to recreate examples using new improved RSS framework. Stay tuned!

I was asked several times to explain how I did RSS for my site. Django has RSS framework, which is not documented. Most probably I am not the right guy to explain it but I’ll try.

There are three ways to implement RSS with Django:

  1. The Simple: using Django’s RSS framework.
  2. The Smart: using django.utils.feedgenerator.
  3. The Hard: write a view and output XML manually or using standard xml.sax.saxutils. If you want Django to do everything for you, then you should use "The Simple Way". If you want some custom object selection, you should use "The Smart Way". For obsessed workaholics, S&M adepts, and guys-with-really-convoluted-needs the only way is "The Hard Way". Being lazy I prefer #1 and #2. If you want #3, I suggest you to study files mentioned in "The Simple Way" subsection below.

The Simple Way

If you look into Django’s code, you will find several files related to RSS:

  • django/utils/feedgenerator.py – defines generic feed object (you can set common parameters and add items to a feed), 2 types of RSS feed (they can render itself producing XML), and default RSS feed.
  • django/core/rss.py – defines feed configurator object (it takes multiple functions, which define item generation), and simple feed registration facility. It uses feedgenerator.
  • django/views/rss/rss.py – uses registered feeds to produce XML file.
  • django/conf/urls/rss.py – routes incoming requests to RSS view above.

So let’s use it to create a feed for my blog documents.

1) We have to tell Django to generate RSS automatically. Let’s add a line to our url list. It is usually located in settings/urls/main.py.

1
2
3
4
5
6
7
from django.conf.urls.defaults import *

urlpatterns = patterns('',
    # my application urls
    (r'^rss/', include('django.conf.urls.rss')),
    # more my application urls
)

2) We have to describe our feed for Django. Let’s create main_rss.py file in our settings directory (next to our main.py):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from django.core import rss
from django.models.core import sites

# these are entries of my blog
from django.models.blog import documents

blog_document_feed = rss.FeedConfiguration(
    slug = 'blog',
    title_cb = lambda param: "I Can't Believe It's Blog!",
    link_cb = lambda param: 'http://%s/blog/' % sites.get_current().domain,
    description_cb = lambda param: 'The Great Example of Why 99.999% of Blogs Suck',
    get_list_func_cb = lambda param: documents.get_list,
    get_list_kwargs = {
        'limit': 10,
    }
)

rss.register_feed(blog_document_feed)

You can see that this definition is pretty much generic. It defines name of the blog, URL to blog’s main page, blog’s description, function to get a list of my blog’s documents, and some parameters for the function. I decided to limit my list to ten recent items. Pay special attention to slug argument – it serves as identificator for this feed.

Typically RSS feed item should have item’s title, item’s description, and item’s URL. How does Django know about it? Title and description are generated using templates. You have to define template files with following names: rss/_yourslug__title.html and rss/_yourslug__description.html. In my example yourslug is blog. Let’s create these files in our templates/rss directory.

blog_title.html:

1
{{ obj.title }}

blog_description.html:

1
{{ obj.get_teaser }}

title is a field of my blog’s document, get_teaser() is a method, which creates a teaser (teaser is used when document’s body is too long).

Now we have to take care of URL. It requires adding get_absolute_url() method to your model, which returns (you guessed it right) an absolute URL of your item. In my case I already have get_full_path() method, which returns relative path. I’ll reuse it. Let’s add the method to our model (Document):

1
2
def get_absolute_url(self):
    return '/blog/' + self.get_full_path()

Now we all set and Django will generate RSS, when somebody accesses /rss/yourslug URL. Go to /rss/blog/ to see the result.

So far so good. What about some parameters? Let’s create a feed for specific author. Our main_rss.py will look like that:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
from django.core import rss
from django.models.core import sites
from django.models.blog import documents, authors

blog_document_feed = rss.FeedConfiguration(
    slug = 'blog',
    title_cb = lambda param: "I Can't Believe It's Blog!",
    link_cb = lambda param: 'http://%s/blog/' % sites.get_current().domain,
    description_cb = lambda param: 'The Great Example of Why 99.999% of Blogs Suck',
    get_list_func_cb = lambda param: documents.get_list,
    get_list_kwargs = {
        'limit': 10,
    },
)

blog_author_feed = rss.FeedConfiguration(
    slug = 'authors',
    title_cb = lambda param: "I Can't Believe It's Blog! by " + authors.get_object(pk=param).name,
    link_cb = lambda param: 'http://%s/blog/authors/%s/' % (sites.get_current().domain, param),
    description_cb = lambda param: 'The Great Example of Why 99.999% of Blogs Suck',
    get_list_func_cb = lambda param: documents.get_list,
    get_list_kwargs = {
        'limit': 10,
        'order_by': ['-pub_date'],
    },
    param_func = lambda param: param,
    param_kwargs_cb = lambda param: {'param': param},
    get_list_kwargs_cb = lambda param: {'author__id__exact': param},
)

rss.register_feed(blog_document_feed)
rss.register_feed(blog_author_feed)

New feed configuration is a little bit complicated. We have a special function to select documents by author id. Plus we have two dummy functions to process our parameter.

You can see that our new slug is authors. We should add appropriate templates. Copy existing blog_title.html to authors_title.html and blog_description.html to authors_description.html. That’s it! Go to /rss/authors/1/ to see it in action.

Are you with me so far?

The Smart Way

Why do we need anything else? Because we want to customize our RSS feed. Plus existing RSS framework has some deficiencies:

  • My blog document model has concept of authors. RSS 2.0 has concept of authors. Our RSS doesn’t have authors.* My blog document model has concept of categories. RSS 2.0 allows to specify categories. Our RSS doesn’t have them. It will be a problem with some blog aggregators.
  • Complex criteria may produce hairy feed configurations with external functions.
    • As you all aware I have hierarchical categories. I want two types of feed per category: documents, which belong to given category, and documents, which belong to given category and all descendants. The latter requires some non-trivial coding.
  • Configuration is very convoluted: you have to define a lot of callback even in simple RSS feeds with parameters.
  • Each RSS feed requires two template files. Almost all of them are going to be the same, but you cannot reuse them.

Given all that RSS framework is going to be overhauled. I hope it’ll come with official documentation and Atom feed.

In order to be able to customize RSS we are going to use feedgenerator directly in our views. Let’s reimplement feeds from the previous subsection. We are going to create RSS feed and populate it manually. It gives us freedom to define any subset and we can rearrange items as we please:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
from django.utils.feedgenerator import Rss201rev2Feed

def rss201(request):
    try:
        object_list = documents.get_list(order_by=['-pub_date'], limit=10)
    except documents.DocumentDoesNotExist:
        raise Http404
    current_site = sites.get_current()
    blog_link = u'http://%s/blog/' % current_site.domain
    feed = Rss201rev2Feed( u"I Can't Believe It's Blog!", blog_link,
        u'The Great Example of Why 99.999% of Blogs Suck' )
    for object in object_list:
        author = object.get_author()
        link = blog_link + object.get_full_path()
        feed.add_item( object.title.encode('utf-8'), link, object.get_teaser().encode('utf-8'),
            author_email=author.email.encode('utf-8'), author_name=author.name.encode('utf-8'),
            pubdate=object.pub_date, unique_id=link,
            categories=[x.encode('utf-8') for x in object.get_simplified_categories()] )
    response = HttpResponse(mimetype='application/xml')
    feed.write(response, 'utf-8')
    return response

def author_rss201(request, object_id):
    try:
        author = authors.get_object(pk=object_id)
        object_list = author.get_document_list(order_by=['-pub_date'], limit=10)
    except authors.AuthorDoesNotExist:
        raise Http404
    except documents.DocumentDoesNotExist:
        raise Http404
    current_site = sites.get_current()
    blog_link = u'http://%s/blog/' % current_site.domain
    feed = Rss201rev2Feed(
        u"I Can't Believe It's Blog! by " + author.name,
        u'%sauthors/%d/' % (blog_link, author.id),
        u'The Great Example of Why 99.999% of Blogs Suck' )
    for object in object_list:
        author = object.get_author()
        link = blog_link + object.get_full_path()
        feed.add_item( object.title.encode('utf-8'), link, object.get_teaser().encode('utf-8'),
            author_email=author.email.encode('utf-8'), author_name=author.name.encode('utf-8'),
            pubdate=object.pub_date, unique_id=link,
            categories=[x.encode('utf-8') for x in object.get_simplified_categories()] )
    response = HttpResponse(mimetype='application/xml')
    feed.write(response, 'utf-8')
    return response

See results here: /blog/rss201.xml and /blog/authors/1/rss201.xml.

You can see that it takes about twice more code but now we have authors and categories. Note that author RSS feed retrieves relevant documents differently. It’ll come handy, if you want complex document selection. Additional benefits are:

  • No need for external templates for titles and descriptions in simple cases.
  • No need for adding extra methods to models (obviously).
  • No need for document to know its exact URL – I don’t like that idea at all.
  • You can implement any URL schema you like instead of your_rss_root/yourslug/parameters.

Obvious drawbacks are:

  • It takes more code.
  • While the code is pretty much generic, you have to tailor it for your models.

All pros and cons depend on your specific needs. You be the judge.

What about RSS feed for categories? This is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# exclusive version
def category_xrss201(request, object_id):
    try:
        category = categories.get_object(pk=object_id)
        object_list = category.get_document_list(order_by=['-pub_date'], limit=10)
    except categories.CategoryDoesNotExist:
        raise Http404
    except documents.DocumentDoesNotExist:
        raise Http404
    current_site = sites.get_current()
    blog_link = u'http://%s/blog/' % current_site.domain
    feed = Rss201rev2Feed(
        u"I Can't Believe It's Blog! - " + category.full_name,
        u'%scategories/%d/' % (blog_link, category.id),
        u'The Great Example of Why 99.999% of Blogs Suck' )
    for object in object_list:
        author = object.get_author()
        link = blog_link + object.get_full_path()
        feed.add_item( object.title.encode('utf-8'), link, object.get_teaser().encode('utf-8'),
            author_email=author.email.encode('utf-8'), author_name=author.name.encode('utf-8'),
            pubdate=object.pub_date, unique_id=link,
            categories=[x.encode('utf-8') for x in object.get_simplified_categories()] )
    response = HttpResponse(mimetype='application/xml')
    feed.write(response, 'utf-8')
    return response

# category with dependents
def category_rss201(request, object_id):
    # This code retrives documents for the category and its decendants.
    # It can be improved but my categories don't have a lot of decendants
    # so why bother?
    # One good solution is described here:
    #     http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/141934
    try:
        category = categories.get_object(pk=object_id)
        object_list = category.get_document_list(order_by=['-pub_date'], limit=10)
        for c in category.get_all_children():
            object_list.extend(c.get_document_list(order_by=['-pub_date'], limit=10))
    except categories.CategoryDoesNotExist:
        raise Http404
    except documents.DocumentDoesNotExist:
        raise Http404
    object_list.sort(lambda a, b: cmp(b.pub_date,a.pub_date))
    i = 1
    while i < len(object_list) and i < 10:
        if object_list[i-1].id == object_list[i].id:
            del object_list[i]
        else:
            i = i + 1
    del object_list[10:]
    # now let's create a feed
    current_site = sites.get_current()
    blog_link = u'http://%s/blog/' % current_site.domain
    feed = Rss201rev2Feed(
        u"I Can't Believe It's Blog! - extra " + category.full_name,
        u'%scategories/%d/' % (blog_link, category.id),
        u'The Great Example of Why 99.999% of Blogs Suck' )
    for object in object_list:
        author = object.get_author()
        link = blog_link + object.get_full_path()
        feed.add_item( object.title.encode('utf-8'), link, object.get_teaser().encode('utf-8'),
            author_email=author.email.encode('utf-8'), author_name=author.name.encode('utf-8'),
            pubdate=object.pub_date, unique_id=link,
            categories=[x.encode('utf-8') for x in object.get_simplified_categories()] )
    response = HttpResponse(mimetype='application/xml')
    feed.write(response, 'utf-8')
    return response

Not satisfied yet?

The Hard Way

Why do you need anything else? "The Simple Way" is restricted by FeedConfiguration. That’s why we didn’t have categories and authors embedded in our RSS feed. "The Smart Way" is restricted by SyndicationFeed of feedgenerator – some RSS parameters are not implemented. It gives you a superset of "The Simple Way" but it is a subset of full RSS spec.

If you really want to do it the hard way, you don’t need my simple tutorial.

Conclusion

If you want to define simple RSS feed, do it "The Simple Way". And be ready to change your code after the overhaul.

If you want to define RSS feed with extra elements (notably categories), your data representation is non-trivial, or your retrieval criteria is non-trivial, do it "The Smart Way". I think this method will survive the overhaul.

If you are not in above categories, you should ask yourself: Am I doing the right thing? If answer is yes, do it "The Hard Way".

Update: when I posted this article initially, it was cut in a middle. Why? Because of this bug. So if you use MySQL with Django you should apply my trivial patch and modify all text fields in your existing database from TEXT to LONGTEXT manually.