For a basic overview: http://en.wikipedia.org/wiki/Paraccel
As for the rest of the article, it feels like a basic Data Warehousing 101 re-discovered. It should have been titled "Analytics: Back To The Future" :-)
How much time and money would have been saved learning Database Theory/SQL/Data Warehousing/Dimensional Modeling instead of cramming everything into an unstructured data-store?
And even for moderate data sizes (10+ GB per table), row store DBs tend to become painful. This is especially true when you need to support ad-hoc reporting queries, since the usual technique of matching your schema, indexes, and queries won't be effective any more. With true ad-hoc reporting, your only hope becomes lots of shallow indices rather than ones tuned to a particular query.
It's so exhausting to hear how much smarter you are and if we just educated ourselves we would realise the error of our ways. People who choose the technologies aren't stupid or masochistic. They understand their use case and the fact is that there are plenty of situations where SQL is suboptimal.
This weekend I loaded 2 billion rows from S3 both ways:
- From a single gzipped object: 4 hours 42 minutes
- From 2000 gzipped slices of 1M rows each: 17 minutes
(Loading from gzipped files is considerably faster, in addition to saving S3 charges.)
The article notes that choice of distribution key is critical. I'd add that choice of sort key is equally important. In my testing, a better sort key improved compression from 1.5:1 to 4:1, and also made common queries 5x faster.
Unfortunately, you only get one dist key and one sort key per table, so less common queries could get slower.
The OP's cluster is a 16-node hs1.xlarge cluster (has 3 spindles per node). There's actually a more powerful node-type hs1.8xlarge which has 24 spindles on each node. More info: http://aws.amazon.com/redshift/pricing/
So it's not fair to compare Redshift performance to your Vertica cluster unless the hardware is similar.