What Postgres & Redshift represent are are two different products for two very different problems. Postgres is good for small sets of transactional data like orders in a shopping cart system (less than 1TB). Redshift is good for big sets of data involving user behavior and clickstream analysis (greater than 1TB). I would not want to manage clickstream data on a single instance of Postgres nor would I want to manage an order system in Redshift.
A better test of Redshift would be to see how it compares to Asterdata...particularly with both in AWS. That should be telling.
With that being said, I'd be interested to see how RedShift compares to Impala.
* You're measuring request latency. What part of that (for RedShift) is due to the network? (EDIT: I re-read and saw you're using `SELECT 1` as a gauge for round-trip latency and subtracting it from the results. Are you only doing this for RedShift, or also for local PostgreSQL? To me, it seems like that heuristic is over broad -- it encapsulates not only network latency, but syscall overhead, query parsing, etc).
* In your tests, PostgreSQL without indices performs on-par with RedShift. Does RedShift not support indexing? Is there some metric you're trying to show by not using indices? As designed, this benchmark does not map to any use-case I've ever seen.
* Not sure about the index support. Didn't try.
My idea was quite simple. I have some data at work (databases up to 30GB). Sometimes we hope to find something better. The main question was - will RedShift help, will it be radically faster? Will it be radically easier?
The answer for me - no, it won't help in my case, we need that 30GB data in real time, it looks like RedShift is more when you have 1TB+ data. Yes, it is radically easier.
> Amazon Redshift doesn’t require indexes or materialized views and so uses less space than traditional relational database systems.
Reading through the rest of their FAQ, it sounds like they echo your conclusion -- RedShift shines the most for use-cases where the dataset is large enough that, to use PostgreSQL, you'd have to shard out multiple instances.
--
And as others have pointed out, your 30 GB data set is pretty tiny. You could look at some of the in-memory DB options out there if you need to speed things up.
The local setup was quite usual: PostgreSQL 9.2, Mint 13, default conf in VirtualBox in iMac i5 12GB. (read: home computer, no tuning)
For me the result is that mostly RedShift is on par with local PostgreSQL, sometimes even winning for <5M rows. So with better PostgreSQL tuning you can probably stretch it, but not for as much as RedShift can do for REALLY big data.
Also the big deal was that RedShift scaled linearly.
The default Postgres configuration is pretty weak. work_mem is set way to low, for instance, and that's bitten me a few times. I wouldn't say it's unrealistic--lots of people run with it that way in production and never find out how easily they could speed things up. Even me, for years.
But ultimately I'm more swayed by your interaction with it and I hate the endless benchmark tweaking that comes after every blog post about performance testing stuff. The point of this Redshift thing is hugeness first and foremost, so it's interesting.
For example, until your data scales above 20 GB, you'd be able to host it on a $5 SSD-based server from Digital Ocean. Most databases are bottlenecked by I/O. So switching to SSDs gives you the biggest performance boost.
On the high end, Amazon's offer probably will be better. (After all, the major draw to Amazon is "automatic scaling", so that you don't have to worry about Replication or other server administration duties at the high end). But considering how powerful a $5 SSD-virtual machine is today, I think a more realistic test would be with some sort of SSD-based cloud server.
We have found that Redshift is comparable to other columnar databases we work with, while we cannot publish any comparative benchmarks, we did put a blog post on what we found (link in another comment here)
The main issue with Redshift is the lack of multiple sort orders on a table. Take a look at our blog post on first impressions gleaned during the preview. Disclosure: we are one of a couple of systems integrator partners for Redshift.
http://www.full360.com/2013/02/14/aws-redshift-full360-first...