It warms my heart to see another example of test client abuse. We used the test client to implement a pure Python ESI processor[1]. We ran it in production for awhile, but no one should ever do that. Varnish is the answer. The code was still running on our development machines when I left.
For high traffic apps, Varnish is the answer as you don't hit the application layer.
If you think that's too complicated, try nginx-memcached - also an excellent solution.
If not that, try django's template caching with memcached - also extremely fast but will hit the application layer.
If you're in some shared hosting environment (you probably are too small still to warrant this kind of aggressive caching on static assets - but hey, efficiency never hurt anybody :P) without access to memcached, use django's cache backend with a file based cache. It's almost 100% as what this does and you don't have any additional overhead.
Beats me why people are re-inventing the wheel - or am I missing something ?
In most applications I work on, I lovingly use and abuse Memcached, Redis, and Varnish. If you’re working on an application that warrants using a live website and the whole application server shebang, then yes, I’d agree with you.
But for something like my blog and other non-dynamic websites which don’t update very much at all, I’m not sure if I see a pressing need for an application server. My blog previously ran varnish-nginx-uwsgi-django, but I was moving off of a VPS and it was the last thing left on that server. I got curious.
In the case of something like the L.A. Times’ Data Desk[2] projects (who use their own django-bakery app), if some views are very expensive/slow to generate, you can offload the work from the application server and do it in advance. (This makes sense if you want to just render everything out on a fast workstation or if you have a local database of several hundred gigabytes that you don’t want a live server querying to crunch the data.) It’s not out of the question to pregenerate HTML pages, JSON for visualizations, and simple image files (generated in PIL).
In any case: it’s not so much a question of “high traffic apps” as much as the tradeoff between (computation cost + server maintenance cost) and (app that is server-side dynamic or updates frequently). Most people don’t want to configure and maintain an app server (with cache layers and all) for a simple app and those that don’t seem to have uptime issues the moment they get any legitimate traffic: see [3].
So:
* I decided I didn’t want to maintain an app server for my blog, and my historical average for updates is about once every four weeks (or even more infrequent). * People seemed to be big fans of Jekyll/Hyde, Movable Type’s static publishing mode[4], WP Super Cache, etc. * I felt a Django-friendly analogue to those would be cool. * Like any developer tinkering with their own blog, there didn’t have to be a point.
[1]: http://pypi.python.org/pypi/hyde/ [2]: http://datadesk.latimes.com/ [3]: http://inessential.com/2011/03/16/a_plea_for_baked_weblogs [4]: http://daringfireball.net/linked/2011/03/18/brent-baked
Static files on S3 are far simpler than setting up Varnish and coming up with a reasonable strategy for cache invalidation. Varnish is great, but I don't know anyone who would describe setting it up as "simple." This solution, on the other hand, looks quite simple.
I don't see any particular gains however since high traffic pages (static or otherwise..) served off S3 will end up costing way more than on a shared host/dedicated server.
What would be an exact use case where this would really be important and help out ? I'm just trying to understand what need it solves and under what circumstances hosting static pages on S3 is beneficial in cost/performance.
Glad you like it.
Does it handle images? How about multiple sites/subdomains?
For more dynamic file storage (say, using FileField or ImageField in a model), I believe django-storages would work, too. (Make sure you configure django-storages with the DEFAULT_FILE_STORAGE option set to S3 also.)
Assuming you’re managing your site via a local dev server (or a server that "hosts" the "hot type" version of the site), any time you "upload" a file to your local server, it'll actually upload to S3 (and any calls to "field.url" will actually map to the S3 URL). Not sure how well it'll work in all use cases: I haven't actually used FileField or ImageField myself in the django-medusa+django-storages usecase, but I have used both separately so I’m fairly sure this is possible.
This is a pretty darn good question though, so I’ll likely make a follow-up blogpost with a more comprehensive walkthrough regarding handling staticfiles and FileField/ImageField. Sometime in the near future.
Multiple sites/subdomains is a bit more complicated. I’d say you should probably use separate Django instances for each and render them separately. (For S3, you’d need to use separate buckets, anyway.) If they need to share data, you can configure multiple Django settings.py configurations for each site but still use the same source tree and local database. (See the Django sites framework: [3])
[1]: https://docs.djangoproject.com/en/dev/howto/static-files/ [2]: http://django-storages.readthedocs.org/en/latest/backends/am... [3]: https://docs.djangoproject.com/en/1.4/ref/contrib/sites/
What if you don’t want to use your own infrastructure? (See Ars Technica’s WWDC liveblog[1], which polls JSON files that are in the same directory and appeared to be periodically updated during the event, by some software that a reporter was using. Ostensibly because the feature is short-lived, super-high-traffic — likely thousands of concurrent users — and should be as low latency as possible due to the nature of the event.)
Not to knock on Varnish, because I use it on plenty of larger things and love it. I just think that there are usecases where you can rationalize not even having an application server to cache in front of.
[1] https://s3.amazonaws.com/liveblogs/wwdc-keynote-2012/index.h...