The second part was a bit more "fancy". There were two really slow parts to this (a) fetching the origin (from S3) and (b) applying lossless compression (for a large image, it can take 10+ seconds). Fetching from origin is easily solved by caching the origin to disk. So if you ask for goku.png?w=90001&h=9001 and then goku.png?w=2393&h=43433 it's only going to be 1 origin fetch. For the lossless compression, we just used the filesystem as a queue. We'll serve up the umcompressed image with a short cache header (maybe 10 minutes) and store it in /storage/uncompressed. The filesystem is monitored and when a file is added, we compress it and them move it to /storage/compressed.
So, when you serve an image, the flow is:
- check for the file in /storage/compressed/ and serve that with a long cache header (this is a fully transformed image (hash the querystring parameters))
- check for the file in /storage/uncompressed/ and serve that with a short cache header (this is a fully transformed image (hash the querystring parameters))
- Check if we at least have the original in /storage/original
- if not, fetch the original, put it in /storage/original
- Transform the image, store it at /storage/uncompressed and serve it up- In the background, compress images and move them from /storage/uncompressed to /storage/compressed
It might seem like overkill when you consider that, despite serving thousands of images per second, the CDN handles almost every request. The problem is with the lossless compression. We found it impossible to do it on-the-fly for too many of our images, so you absolutely need that available and ready to go for the 5% CDN miss.
Do you handle the malicious case of someone supplying various widths and heights potentially DoSing the server?
Whether storing it back on S3 is "good enough" depends on whether you feel the latency to fetch from S3 is acceptable. I don't have any hard numbers (I might have at some point). I imagine you'll see a percentage in the 1-4s range, which is pretty bad considering you still have to serve it to the CDN and then the CDN to the user. If you have users on mobile or in developer countries, you do what you can to make your side as fast as possible.
Never had malicious users, but we worried about it. We took a reactive approach: monitoring disk space usage. It never proved necessary to do more. You're definitely open to a DOS attack. Hard to mitigate too...can't rate limit since the request comes from the CDN. You could whitelist certain dimensions, but we also allowed our content owners to specify the focal point of the image, which we'd center our crop on, which means any value of x and y is reasonable. You could possibly store that data on the image servers, instead of passing it in the querystring, but then you're introducing state and, with multiple servers, synchronisation. shudder.
You can see it in action at:
http://0.viki.io/viki.jpg?s=263x220&q=h
with documentation at:
http://dev.viki.com/v4/images/
(the [q]uality argument isn't documented, weird....unless you specify a quality (I only remember [h]igh) we pick a jpg compression based on the filesize)
File size was a problem for us too, as was file format. We needed a solution that would work with pngs, jpegs, gifs and tiffs.
You name it and ImageMagick could do it:
- Leak memory
- Perform terribly until you find the magic incantation that is 10x faster
- Output wildly different images after a minor patch release
- Remove / change options after a minor patch release
- Enormously degrade performance after a minor patch release
- Have numerous security vulnerabilities all the time, which require frequent upgrades
- Dump core more often than you might like
It took us upwards of 3 months to simply move from one ImageMagick release to another (a few minor versions ahead), and we had to do all sorts of workarounds and A/B tests to ensure the images would look right.I heard that GraphicsMagick was superior in that it maintained some consistency of behavior between versions, but it doesn't have all of the functionality of ImageMagick. So we couldn't switch to it.
Another company that I worked for had a fleet of several thousand servers running constantly just to thumbnail user uploaded images, and it was not unheard of for it to fall behind.
IM / GM are the stock answer to process images, but from my experiences they have no place in a production system. I think this is an area that is pretty poorly served by open source software; there are lots of libraries to handle different image formats, but no good infrastructure exists to tie it all together (that I'm aware of).
For gifs, we used gifsicle, but we didn't support the full set of functions with it.
Nginx to handle existing files. Python + Pillow + cherrypy (or could be any other microframework) to handle image processing on the fly and then caching processed image to the disk.
All in all around 350 lines of code in python. And something like 100 lines in nginx (because of a heavy filename processing and inner url rewriting).
Result - facebook-like image processing:
Simple resize: http://media.example.com/w400x200/id_token_string.jpg Crop (based on coords): http://media.example.com/20.20.380.300/id_token_string.jpg Or crop resize from center: http://media.example.com/c200x200/id_token_string.jpg
and so on...
Fun project.
A: Imgix = cost of imgix subscription + integration time * hourly rate (it's dead simple)
B: Rolling your own = dev time * hourly rate + maintenance/ops time * hourly rate
For most orgs B > A. For most individual programmers since hourly rate is not a factor A < B.
It's good to be dev and have spare time for stuff like that. You learn stuff while making such projects, you save money and it just works.
About money, judging by the pricing on imgix page (and if I understand pricing correctly, I'm saving around $300 per month ($50 for cheapest plan + $250 for traffic).
So, the answer is pretty much this: it'll handle as many rps as nginx serving static files can handle.
If we are talking about unique requests each requesting to process unique image - I'm pretty sure linode server CPU it's running on will choke.