How To Make An Infinitely Scalable RDBMS (opens in new tab)

(highscalability.com)

154 pointsjpmc12y ago89 comments

89 comments

There was very interesting presentation by one professor. I'm not sure about what university, but he seemed to know his work.

He talked about how databse world is about to change. ACID is really expensive in terms of resources, and so are the more difficult things about relational schema (foreign keys, checks, etc). And architecture of classic RDBMSes is pretty wasteful -- they use on-disk format but cache it in memory.

He talked about how there are basically three new paths for DBMSes to follow. 1) Some drop the restrictions to become faster. This is the NoSql stuff, because you don't really need ACID for writing to Facebook wall.

This is called NoSql database.

2) OLAP, in data warehousing, the usual way to do things is that you load ridiculous amount of data into database, and then run analytical queries, that tend to heavily use aggregation and sometimes use just few dimmensions, while the DWH data tend to be pretty wide.

For this, column store makes perfect sense. It is not very quick on writes, but it can do very fast aggregation and selection of just few columns.

This is called Column store.

3) In OLTP, you need throughtput, but the question is, how big are your data, and how fast do they grow? Because RAM tends to get bigger exponentially, while how many customers you have will probably grow linearly or maybe fuster, but not much. So your data could fit into memory, now, or in future.

This allows you to make very fast database. All you need to do is to switch the architecture to memory-based, store data in memory format in memory and on disk. You don't read the disk, you just use it to store the data on shutdown.

This is called Main memory database.

No, that was the presentation. It was awesome, and if someone can find it, please give us a link! My search-fu was not strong enouhg.

...

What interests me is that we have NoSql databases for some time already, and we have at least one huge (are very expensive) column store: Teradata. But this seems to be first actual Main memory database.

My dream would be to switch Postgres to main memory or column store mode, but I guess that's not happening very soon :)

mbesto12y ago

> But this seems to be first actual Main memory database.

Eh, not really...

This is exactly what SAP has been doing for several years via Hasso Plattner and the Potsdam Institute: https://epic.hpi.uni-potsdam.de/Home/HassoPlattner

If you've ever worked with large scale "enterprise" database warehouses, they tend to be slow and clunky. Back in 2006ish SAP took the whole Data Warehouse (well mainly just the data cubes) and chucked it into a columnar database (at the time it was called TREX, then became BW Accelerator) - http://en.wikipedia.org/wiki/TREX_search_engine

TREX exist way before 2006. SAP also bought a Korean company called P* (IIRC) which did non-columanr (traditional relational) and threw it into memory. SAP also had a produce called APO LiveCache - http://scn.sap.com/community/scm/apo/livecache - which lived around the same time.

This has now all evolved to a standard offering called SAP HANA - http://www.saphana.com/welcome - In it's second year of inception I believe SAP did roughly $360m in sales just on HANA alone.

Also, IIRC is InnoDB basically the open source version of exactly what you're talking about with "Postgres to main memory"?

edit- correction in TimesTen

army12y ago

InnoDB isn't anything like that - it's a transactional database engine that's been around since the 90's and has since become the standard storage engine for MySQL - it competes directly with Postgres' storage layer.

JeffDClark12y ago

Is this the talk that you are referring to? http://slideshot.epfl.ch/play/suri_stonebraker

nl12y ago

Note that Stonebraker makes some good points, but there are many ways to build scalability and Stonebraker is too fast to dismiss many.

In particular, his criticism of traditional databases seems based more on philosophy rather than evidence.

I'd advise reading both sides of the story:

http://lemire.me/blog/archives/2009/09/16/relational-databas...

http://lemire.me/blog/archives/2009/07/03/column-stores-and-...

http://architects.dzone.com/articles/stonebraker-talk-trigge...

http://gigaom.com/2011/07/11/amazons-werner-vogels-on-the-st...

http://dom.as/2011/07/08/stonebraker-trapped/

The date on some of those posts in interesting. 2009 is quite a while ago now, and I'd suggest that columnar datastores haven't exactly taken over. Some implementations have made some progress (eg Cassandra), but OTOH many non-traditional datastores have added traditional-database like features (eg, Facebook's SQL front end on their NoSQL system), and traditional databases have added NoSQL features too.