PostgreSQL Scalability: Towards Millions TPS (opens in new tab)

(akorotkov.github.io)

521 pointslneves10y ago210 comments

210 comments

Postgres has been my DB of choice for nearly a decade. The only times I wind up working with another db are because:

(1 it is a better technical fit for a very specific problem

(2 there is already a legacy db in place

I have been voted down at a couple of startups that wanted to run a "MEAN" stack, invariably all of those startups moved from MongoDB or shutdown.

The only time I will advocate for anything other than Postgres is when Wordpress is involved. If the data model is simple enough then MySQL is more than up for the task, and it avoids an additional database dependency.

Thankfully all the ORM's that are worth using support MySQL and Postgres, so using both is very doable.

### Useful Postgres (or SQL in general) tools/libraries :

Bookshelf ORM http://bookshelfjs.org/

PostgREST automated REST API https://github.com/begriffs/postgrest

Sqitch git style migration management http://sqitch.org/

pjmlp10y ago

I have been using it a lot lately, and do like it.

It isn't a popular opinion on HN, but I will still advise for Oracle or SQL Server in terms of tooling, cluster scaling, server side programming and DB drivers.

Then again, we work with customers whose Oracle and SQL Server licenses costs aren't an issue.

Intermernet10y ago

Have an upvote, I'm not sure why you're being down-voted.

If you take the issues of open-source and licensing out of the equation (both important issues in their own right, but not related to the point at hand) then Oracle and SQL Server are both ridiculously good.

I personally try to avoid them (due to the cost, and the lock-in) but they are astoundingly performant and featureful RDBMSs with a huge amount of support and documentation behind them.

pjmlp10y ago

Thanks for the upvote.

I guess some took it personally, although I mentioned I do like Postgres.

I just happen to like the other ones even more, since I was lucky to be able to use them in a few projects.

1 more reply

antb12310y ago

1) Coming to a discussion on Postgres and saying Oracle is better is clearing trolling

2) "server side programming" - as you have the source code with Postgresql and their a plugins for most major programming languages I don't buy this.

3) If you of spent the same on Enterprisedb or CitusDB I can guarantee you would get similar polish and support. People don't.

hackmiester10y ago

Can you explain how the comment was trolling? It seemed to be a legitimate comparison.

gshulegaard10y ago

As others have mentioned, with Citus Data open sourcing CitusDB, Postgres is now capable of fairly easily handling Mongo sized records.

Something else you might be interested in:

https://www.linkedin.com/pulse/mongodb-32-now-powered-postgr...

I have always been a Postgres fan, but now I find it very difficult to imagine a problem MongoDB would be better suited for.

nileshtrivedi10y ago

Also check Postgraphql: https://github.com/calebmer/postgraphql

Inspired from Postgrest linked above, it automatically builds a GraphQL API by reflecting on postgres schema.

gjolund10y ago

I was looking at that the other day, really exciting project.

bulkan10y ago

There is also Sequelize [1] which has more activity

1 - http://docs.sequelizejs.com/en/latest/

Keepo10y ago

I would also like to mention Objection[1] it is very minimalistic and doesn't get into your way. Most of the time you are writing queries with Knex[2] or you can write raw sql if you feel like it. I tried a lot of ORMs for node and this is the one I liked the most. The guy working on it is also very responsive and have superb documentation.

1 - http://github.com/Vincit/objection.js

2 - http://knexjs.org

hibearpanda10y ago

We've been using Objection over bookshelf recently and we really like it. One thing that caused us to choose Objection over bookshelf is because bookshelf doesn't support composite keys:

https://github.com/tgriesser/bookshelf/issues/720

te_chris10y ago

I haven't worked with Sequelize personally, but a friend has been recently and curses the day it was born - he wishes he'd used bookshelf.

iyn10y ago

Can you ask your friend about the exact details? From what I've seen it seemed OK.

ap2221310y ago

The only issue that I have with this top-most comment is that it presents PG as a silver bullet. But, there are a lot of different types of databases for a reason.

For instance, at my current startup, we employ at least 7 different databases (including PG). And, I don't say that to brag - each has a specific use for the problem at hand.

You have to consider the needs and trade offs of your specific project. And, if you can, try to isolate your storage behind some interface. Because often your needs will change.

(I say this with 20+ years of experience and over a dozen commercially-successful products in my belt)

jeffdavis10y ago

Not disagreeing necessarily, but having many different databases has a cost.

Data has more value when combined, so a wise database choice also depends on what you already have. You can combine data from different systems (e.g. FDWs, which postgres has great support for), but something is usually lost along the way.

I think postgres is a good default, because it is good for a lot of things. But sure, if you make an informed decision otherwise, there's nothing wrong with that.

frostymarvelous10y ago

I think it's more because Postgresql is a good starting point. You can start with it and gradually introduce special/niche databases where required.

It doesn't matter whether you have your needs clearly defined, Postgresql is a jack of all trades and a master of many.

gjolund10y ago

I agree with you 100%.

An abstracted interface for your datalayer is a must have.

Many startups and projects begin with a single db and grow into new dbs as the business requirements change.

Obviously if you know your data model well enough you can foresee a lot of these requirements and pick the right tool for the job.

However, you will usually need to pick a db to start with and hope that it will accommodate as many of those unknowns as possible.

I believe that Postgres is the best choice in that scenario, as it is extremely mature and has an incredible amount of flexibility.

ap2221310y ago

Totally agree. In 95% of the cases, you won't go wrong starting with PG.

frostymarvelous10y ago

Funny. I just said the same thing before reading your comment.

Somehow that makes me feel better that at least, I'm on the right path.

takeda10y ago

It is true, that you should not chose blindly, but a majority of use cases do require a relational database. You use one of specialized databases when you have a specialized use case.

The reason why Postgres is so popular here is because if you want to have an open source solution, Postgres is the best available. I'm sure there are better proprietary solutions, but even though it is free, PG is pretty darn close to them.

manigandham10y ago

> invariably all of those startups moved from MongoDB

Why? Especially after point #1 and assuming the document-store was a good fit for the data model.

takeda10y ago

Because NoSQL is a hype.

Many of the NoSQL essentially takes us back to 60s before Codd came up with relational model[1] These ideas tend to come back once in a while [2][3], but so far nothing is better than relational model.

NoSQL still makes sense in many cases (generally when your specific use case does not need all guarantees of ACID), you can get in return higher performance or horizontal scalability.

MongoDB is aim to be generic database but rides on the wave of NoSQL "coolness". It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.

Currently MongoDB is outperformed by Postgres. In fact there is an application called ToroDB which provides protcol compatibility for Postgres that emulates MongoDB and even that outperforms Mongo. Mongo also doesn't scale well horizontally, so essentially you don't really don't get any significant advantage.

[1] https://en.wikipedia.org/wiki/Hierarchical_database_model

[2] https://en.wikipedia.org/wiki/Object_database

[3] https://en.wikipedia.org/wiki/XML_database

[4] For example they started with mmap'ed memory regions to store the data. Did not initially use fsync() to make sure data is saved on disk. In a way it reminds me of MySQL several years ago. It's much better now than it was in the past, but it has a lot of warts left of from the past.

devishard10y ago

MongoDB is really, really bad. I've never come across another product that was so horrible and yet so widely used.

MongoDB: for when you don't need consistency, availability, or partition tolerance.

There are some really good NoSQL products out there. I seriously think RethinkDB is on par with Postgres. I've also used Cassandra and BerkeleyDB and they're both decent. But unless some core part of your business logic is pathological to implement in SQL (like Reddit's comment trees) you should go with Postgres.

2 more replies

LoSboccacc10y ago

> so far nothing is better than relational model

my favorite quote: "the relational model is rooted depth in the set theory and math is hard to fool"

it was on a post on how all NoSQL are ending up having to reinvent having, group and join clauses on their API because that's what apps do with data

2 more replies

stickfigure10y ago

Currently MongoDB is outperformed by Postgres

I hate hearing absolutist dogma like this. Some things are faster in Postgres, some things are faster in Mongo. We are migrating our analytics db from Mongo to PG but we hit a wall because SELECT COUNT(DISTINCT x)) performs abysmally in Postgres. Mongo's aggregation framework is an immature PITA but it performs this little trick well enough that we're pretty much stuck keeping Mongo around unless we want to use MSSQL or Oracle or something else with a better (and much more expensive) query planner.

We still love (and prefer) Postgres but it is not a pareto improvement. There are always tradeoffs, and this kind of fanboyism just speaks to inexperience.

4 more replies

TheMiller10y ago

> It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.

This sounds exactly like what MySQL development looked like to me early in its rise to popularity.

jeffdavis10y ago

I conjecture that it's rare that someone just needs a document DB.

The main problem someone is thinking about may be solved by a document DB, but then they also have 1000 other problems they didn't think about that are a horrible fit for a document DB.

And due to the nature of databases, where data has greater value when combined, using a new database system for every application isn't a great option.

SQL is great for business data, and passable-to-good for everything else. So the only way to beat it is by being way better at some specific thing, and finding users where that specific thing is so important that it's OK if the data is on an island.

phs250110y ago

Can't speak for the original poster, but at this point Postgres is a very capable and performant document-store database as well with the addition of the JSONB type and associated indexing and modification (in 9.5) functions. It's also more mature, has generally better durability than MongoDB, and it is easy to fall back/extend into relational queries on your document-store data if need be.

IMHO, unless you have a situation such that Mongo's horizontal scaling is actually required, it's better to use Postgres even if you're doing document-store stuff.

SEJeff10y ago

http://instagram-engineering.tumblr.com/post/10853187575/sha... it certainly can be done with postgres, but isn't as simple.

stubish10y ago

Because MongoDB is chasing a moving target. Hype, built in sharding and built in automatic failover are what it has now. The PostgreSQL project was strong when Mongo was conceived and has been accelerating. PostgreSQL has already gobbled up and improved on other MongoDB features like BSON and is where you go when you grow out of MongoDBs more primitive document model. Easy sharding and failover will fall too, years before MongoDB catches up with other must have features like transactions (you will need them eventually) or a half decent query language so you can actually maintain your code rather than having to plan your own queries and encode them as a sequence of JSON documents.

oconnor66310y ago

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

elmigranto10y ago

I don't like this one too much, because Mongo is wrong choice here domain wise, but this one was really convincing for me:

http://cryto.net/~joepie91/blog/2015/07/19/why-you-should-ne...

gjolund10y ago

diaspora was such an interesting project, I dream about resurrecting it often.

gjolund10y ago

Postgres is a better document store than Mongo.

The only time I would recommend Mongo is for storing geo-spatial data, as it has several built ins that make it much easier to work with. Even then I find it a lot more convenient to keep that data in Postgres and replicate it to Mongo.

rpedela10y ago

Why not use PostGIS for geo-spatial data?

1 more reply

m_mueller10y ago

How would you handle replicating a DB to mobile devices? This is the reason why I've been using CouchDB, but if Postgres or a plugin offered something comparable I'd have gone for it for sure.

virmundi10y ago

Out of curiosity, how often do your detached DBs update old records? The reason I ask is nostalgia. A company I worked with used MySQL as an embedded DB (laptops under their control). Yesterday, I thought about how absurd this was. The DB only inserted new records and read static records. An embedded SQL would have worked great here. Would that be true for you too?

m_mueller10y ago

completely depends on the app. It's a PaaS, not just a specific app.

gjolund10y ago

Honestly that's not something I would handle at the db layer. I would build a service that is responsible for keeping client db's in sync and abstract that away from my database entirely.

m_mueller10y ago

I'm not a fan of rolling our own sync code if I don't have to.

1 more reply

jmnicolas10y ago

Are you H2database author ?

m_mueller10y ago

Nah, my project is called Protogrid. [1]

[1] http://protogrid.com

eventemitter10y ago

Relared ORM is another orm supporting postgres and mysql. It builds the models by itself by loading the required definitions from the database. https://www.npmjs.com/package/related

gjolund10y ago

I'll have to give that a shot, seems like a nice compromise between fully automated solutions like Postgrest and a conventional ORM.

EGreg10y ago

Just out of curiosity, why when wordpress is involved? Doesn't wordpress support PG? What about drupal and other frameworks, would you say the same?

Our framework only has a mysql adapter for now, but it should be pretty easy to add a postgres one.

creshal10y ago

Wordpress does not support anything other than MySQL, which is somewhat unique as far as CMSes go. There seem to be a few Drupal addons that don't mix with Postgres, but other than that it should be fine. Same with all others.

Generally, if you can, you should consider supporting Postgres in your framework. It's a much saner and robuster database from an Ops point of view (replication doesn't fail as often) and more flexible from a Dev view (performance is generally more predictable, much wider feature set with document storage / PostGIS / etc., …).

scurvy10y ago

I like Postgresql, but it's far from superior to MySQL from an ops monitoring point of view. MySQL exposes a lot more information and run-time stats than Postgresql does. Simple things in MySQL are also impossible in Postgresql. How do you simply and effectively guarantee that your Postgresql replicas are not lagged and are indeed connected to the master and successfully writing down new logs in a streaming hot standby replica? You can, but it requires that all of your replication accounts also have superuser access on the master. Fun stuff, that.

1 more reply

__float10y ago

What do you mean by MySQL avoiding an additional database dependency?

charliedevolve10y ago

I think he meant that MySQL is the only supported DB for Wordpress, and while you might be able to get it running on another DB after jumping through some hoops, it's probably not worth the effort.

gleenn10y ago

IIRC PHP has built-in support for MySQL out of the box.

goldbrick10y ago

Well, technically it's a "PHP extension", just one that several distributions include by default.

gjolund10y ago

^^^

jasonmp8510y ago

Andres is a great coworker to have at Citus Data, though I first ran into him on the mailing lists shortly after starting at Citus myself. I was tasked with figuring out "why do certain read-only workloads fail miserably under high concurrency?"

I had never touched PostgreSQL before, nor any Linux performance tools, but I noticed that replacing certain buffer eviction locks with atomic implementations could drastically help this particular case. I emailed the list about it and Andres was someone who chimed in with helpful advice. I wrote up what I'd discovered in my deep dive here: http://tiny.cc/postgres-concurrency

Turns out Andres was already working on a "better atomics" patch to provide easier methods of using atomic operations within PostgreSQL's code base (my patch was a quick hack probably only valid on x86, if that). It's been useful in removing several performance bottlenecks and—two years in—it looks like it's still paying off.

AbacusAvenger10y ago

I wonder, does Intel's TSX/HLE help with these workloads? If it's read-only then I'd expect that it'd be able to elide a lot of the locking (assuming the Intel-designed heuristics do the job).

anarazel10y ago

I'd bought one of the first haswell notebooks to play around with tsx. Before I'd time to do so Intel found the tsx bug...

I hope to have time to play around with it one I have new hardware (I refuse to do performance development on virtual).

But honestly, most remaining performance/scalability problems in pg are more algorithmically caused. So micro optimization, and that's what I'd call tax/hle, aren't likely to biggest bottleneck.

ziedaniel110y ago

Wasn't TSX only enabled on Xeons to begin with?

1 more reply

creshal10y ago

I could've sworn that Intel actually demonstrated TSX on a Postgres fork they created, but my Google-fu is too weak to find the slides again.

snhkicker10y ago

I don't know much but to me it seems PostgreSQL is probably one of the most open and supporting communities maybe this is reason alot of new faces are looking at it including me.

morgante10y ago

It's also a great example of a technology which is mature and well-tested yet actively growing and improving. Most open source projects have a lot to learn from Postgres.

movedx10y ago

What would you say those learning points are? What key factors would you introduce into another OSS project that PostgreSQL currently employs?

stubish10y ago

One of the weirdest is that there is no bug tracker. When things are found to be broken, they tend to get fixed immediately and there is nothing to track. Feature and incremental improvement work seems an arduous process, where you need to own your work and deal with extensive reviewing (no tossing it over the wall for others to maintain).

tacos10y ago

They were my introduction to open source -- and, well, let's just say I wish every project were so well-run. One of the best for sure.

graffitici10y ago

Are there any best practices for using PostgreSQL for storing time series data? Would it be comparable in performance to some of the NoSQL solutions (like Cassandra) for reasonable loads?

SEJeff10y ago

If you want to store a large number of time series, influx might not be for you. For smaller bits of data (where the full dataset can fit on a single machine, or there is a very low amount of dimensionality in the data, influx is nice.

However, if the data grows beyond what a single scaled up machine can achieve, take a look at druid (druid.io). It is a bit more involved in setting up than influx, but was built from the very beginning to scale out horizontally. As a result, I can do realtime analysis (using grafana) of over 10 billion data points and perform aggregations over said data. It is an incredibly useful tool, and the newly released 0.9 looks ever better.

It can also count Alibaby, eBay, Cisco, Paypal, Yahoo, and Netflix as users (amongst many others): http://druid.io/druid-powered.html

It is really impressive tech. Bonus points that some of the original founders of Druid from Metamarkets just founded a company to do enterprise support around it:

http://imply.io/post/2015/10/19/announcing-imply.html

olavgg10y ago

I save 2 billion rows of timeseries data every year. I use a regular btree index for "hot data" that is less than 6 months old and BRIN index for older data. You can do this by writing a functional index.

You also have to spend some time to tune the query cost settings to avoid sequential scans if you're only gonna work with a subset of the data. Another optimization could be implementing table inheritance so you have a table for every year. If you work with data sets for a specific year you would get a big performance boost with sequential scans.

PostgreSQL's biggest weakness at the moment is aggregating data by using several cpu workers/cores. This is coming in PostgreSQL 9.6

Oh and I run PostgreSQL on ZFS with LZ4 compression,

takeda10y ago

> Oh and I run PostgreSQL on ZFS with LZ4 compression

I'm generally a big advocate of ZFS, but I heard that COW file systems (ZFS and btrfs) are generally not good choice for a database workload.

How does it perform for you?

olavgg10y ago

It performs as well as any other file system, but ZFS shines when you are IO bound as compression really helps when you do sequential scans.

I have also tested ZFS with Microsoft SQL Server by exporting a ZVOL over iSCSI(FreeBSD) over 10G ethernet. But without compression as it has no benefit on 4k blocks. Performance was similar to what you would get with the same drives striped on Windows Server 2012. The big win here is of course ZFS's data checksumming. Not sure about snapshot as backup though, I need to figure out how to talk to the Windows SQL Writer Service so it can tell SQL Server to flush and lock so I can take a consistent snapshot. Microsoft really needs to improve their documentation, because this would be really helpful for several enterprises when it comes to backup speed.

lfittl10y ago

Do you mean timeseries data as in IoT/sensor data/etc, DevOps monitoring metrics (i.e. server load, app performance, etc), or something else?

Curious since I'm currently researching how PostgreSQL could do better in this space :)

takeda10y ago

Not the TP, but I personally am interested in the later one (metrics).

There doesn't seem to be any silver bullet yet. And it is also hard to even see how relational database compares to the existing solutions, since most people dismiss it immediately.

goldbrick10y ago

In my experience time-series is one of the use cases that warrants evaluating a specialized database off the bat. Plenty of databases (mongo ahem) advertise supporting time-series but under fairly pedestrian loads hit a wall and force you to resort to dirty hacks.

pzaich10y ago

I'm also in a similar position where I'd like to store approximately 560k records / user / year. My understanding is that Cassandra doesn't support some useful queries that would be useful when business logic is less clear (like group by)[1]. I'm leaning towards using PostgreSQL with a dedicated write DB until performance becomes an issue.

[1] http://stackoverflow.com/questions/17342176/max-distinct-and...

yeukhon10y ago

Datadog uses PostgreSQL. See [1].

[1]: http://www.enterprisedb.com/success-stories/postgresql-succe...

thom_nic10y ago

I decided to tee my timeseries data into InfluxDB. Purpose built for the task and has builtin support for rollups/ aggregation/ retention policy/ gap filling. Admittedly I have not put Influx under much stress or scalability testing since my use case is more based on utility than performance.

Unless PG has some timeseries-specific extensions I have assumed it would be appropriate for a TS-specific database. Also curious to try Riak TS.

siculars10y ago

/plug/ Riak TS was just released open source that does just this. I wrote a post on it, http://basho.com/posts/technical/time-series-the-new-shiny/ .

Disclaimer: I work for Basho, makers of Riak TS.

Jweb_Guru10y ago

Postgres has fairly robust support for temporal-style data. In particular, with 9.5 BRIN indices are often a great fit for heavy write workloads where most queries are range scans. It's not the best database for it, though.

siculars10y ago

Time series oriented databases are usually purpose built around storing data in sorted order on disk (easy on a single machine, harder in a clustered environment) such that range scans are efficient. /plug/ check out riak ts.

Disclaimer: I work for Basho, makers of riak ts.

bnchrch10y ago

PostgreSQL has continued to be one the best open source en-devours so far. An amazingly smart and welcoming community that turns out arguabley the best in class relational database. Kudos and keep innovating team PG.

devy10y ago

     In partnership with IBM we researched PostgreSQL scalability on modern Power8 servers.

That statement and the linked Russian blog white paper[1] makes it seem like a Power8 specific and Power8 is a "a massively multithreaded chip"[2]. I wonder how far off it would be to x86-64?

[1] https://habrahabr.ru/company/postgrespro/blog/270827/

[2] https://en.wikipedia.org/wiki/POWER8

dimfeld10y ago

The article says a few lines down, "The optimization #1 appears to give huge benefit on big Intel servers as well, while optimization #2 is Power-specific. After long rounds of optimization, cleaning and testing #1 was finally committed by Andres Freund."

Also, the chart indicates that the benchmarks shown were run on Xeon chips.

jrcii10y ago

Slightly tangential but I'm genuinely curious, does any have a theory as to why nearly every RDBMS post on Hacker News is about Postgres and almost never MySQL or MariaDB? Considering the relative obscurity of the former it seems somewhat inexplicable.

esaym10y ago

There's been a large migration off of mysql to postgres simply because Oracle got the rights to mysql when they purchased Sun. That is what made me consider, ditching mysql.

The final straw came when I stood up a mysql 5.6 instance to use as a data warehouse for about 5TB of data (15 billion rows). To my horror after spending a few weeks on this project I discovered that mysql only supported a small subset of the SQL language. Most of the needed sql features for data warehouse queries were simply not there (no WITH clause, no ROW_NUMBER(), only one sequence per table, no schema support, and many others that SQL report ninjas love to use)

With that I had to go back to management and convince them to put more time into the project so we could try again with postgresql. There was a large learning curve to go to postgres. It was just different in many areas, but at the same time, reminded me alot of IBM DB2 (both were a product of the 1980's if you look at the history). Postgres fit into our plans nicely, it actually had a full blown intelligent query parser and optimizer unlike mysql. From what I learned, mysql's purpose was to stop people from using text files as storage points in the early 2000's. But once you graduate from that, it's onward to postgres from there.

simoncion10y ago

> From what I learned, mysql's purpose was to stop people from using text files as storage points in the early 2000's.

Well shit, SQLite has that market sewn up these days.

orf10y ago

> after spending a few weeks on this project I discovered that mysql only supported a small subset of SQL

Did you not research this beforehand?

esaym10y ago

At the time, my basic experience with databases had been for normal developer CRUD type operations. I was just getting into BI analytics and stuff. MySql was kind of the defacto choice for a "free" database and it was kind of thrown at me. I didn't mind since I had used it for years and was familiar with setting it up. But no, assuming the thing with "SQL" in its name didn't actually support SQL was not on my checklist sadly.

threeseed10y ago

> There's been a large migration off of mysql to postgres

No there hasn't:

http://db-engines.com/en/ranking

Which makes sense since they aren't really playing in the same space. The benefits of PostgreSQL are largely lost on typical MySQL use cases (light load, simple CRUD access patterns, limited use of JSON/BSON etc).

majewsky10y ago

I question the method. They mostly measure search results, job offerings and tweets about the DB. And that could just as well mean a "MySQL is shit" tweet, an opening for a DBA that can migrate away from MySQL, and so on.

nathan_long10y ago

> Which makes sense since they aren't really playing in the same space. The benefits of PostgreSQL are largely lost on typical MySQL use cases (light load, simple CRUD access patterns, limited use of JSON/BSON etc).

What benefits do you see that MySQL has over PostgreSQL? What defines its niche?

narrator10y ago

The funny thing is is that the trend for analyzing data is going from doing things in SQL databases backed data warehouses with queries to running that stuff on clusters with Spark or Hadoop based solutions.

ec10968510y ago

If it can fit on a single machine, those technologies are not necessary.

combatentropy10y ago

> why nearly every RDBMS post on Hacker News is about Postgres and almost never MySQL

You'll see the same phenomenon on Slashdot.

MySQL is popular among a subset of programmers: web developers. In corporations, Microsoft SQL and Oracle are more popular. Further, MySQL is popular among a subset of web developers: those who use PHP. Among web developers who use Python, Postgres seems more popular.

My suspicion is that MySQL's popularity is tied to those web-hosting plans (1 GB of storage! 1 TB of bandwidth! 10 databases! $10/month!). They were almost always MySQL databases. Web hosts that offered Postgres databases were few and far between. This article corroborates my theory: http://rodner.blogspot.com/2008/01/what-makes-mysql-so-popul.... MySQL's company (which at the time was "MySQL" not Oracle) pushed it in at a critical time in the development of the web.

I have written applications that use: Postgres, MySQL, SQLite, Microsoft SQL, and Oracle. My favorite by far is Postgres. The strangest thing to me is not that MySQL is more popular than Postgres, but that anyone uses Microsoft or Oracle at all. Not only do they cost a lot, but from a purely technical standpoint they are worse.

manigandham10y ago

> Postgres, but that anyone uses Microsoft or Oracle at all. Not only do they cost a lot, but from a purely technical standpoint they are worse.

Based on what? Both commercial databases have incredible features, tooling and extensions that leave postgres behind. Postgres today doesn't even have a solid scale-up or scale-out strategy.

It does have nice SQL support and makes developer lives a little easier but this isnt anywhere close to making it the absolute winner technically.

e12e10y ago

I'm curious what you mean by Postgres not having a "scale-up" strategy. Are you saying that eg MS SQL works better with a few TBs of ram and 128 cores than Postgres does?

1 more reply

jeltz10y ago

The very article you are commenting on is about scaling up with PostgreSQL. PostgreSQL is excellent at scaling up with many small queries, but has problems with few large ones.

2 more replies

breakingcups10y ago

I like and often use Postgres, but professionally use MSSQL a lot and I vastly prefer it for my use case. It has all the DB features I need for my products and it is easier to manage (when it comes to backups, for example). One could argue ease of use falls under a purely technical standpoint.

Could you elaborate a bit more on why you view Microsoft SQL Server as worse?

combatentropy10y ago

MSSQL's text comparison by default is case insensitive. Its limit clause is nonstandard (select top 10 * from . . .). Its syntax for other things, like trimming or casting, is just much more typing. Installing the driver on a Linux server is several times harder than Postgres. When I get the syntax wrong, all it gives me "General Error." The command-line tool to Postgres, psql, works much more smoothly and has many more features than MSSQL's tsql (and Oracle's isql, for that matter).

If you use MSSQL all the time, I have no doubt that you get along fine. You can get used to its ways, and most applications don't need features that aren't shared by almost all databases.

I didn't mean that if you use MSSQL then the app will always noticeably work worse than if you based it on Postgres. It's just that for someone who switches back and forth, MSSQL makes my life less enjoyable, provides no benefit, and costs a lot more money.

brisance10y ago

>>The strangest thing to me is not that MySQL is more popular than Postgres, but that anyone uses Microsoft or Oracle at all. Not only do they cost a lot, but from a purely technical standpoint they are worse.

A lot of decisions are made holistically instead of a "purely technical standpoint". In large companies where there are still IT departments (e.g. use of Active Directory), MS is still very much the de facto platform, and SQL Server is the "logical" choice.

gaius10y ago

Consider that PG is adding wait events in the upcoming 9.6 release, a feature Oracle has had for about 20 years...

ptaipale10y ago

I'm curious abour another thing: like the previous comment asked about MySQL and MariaDB, and your answer cuts away MariaDB from the quote.

However, the trend seems to be from MySQL to MariaDB. Red Hat and many other Linux distros, for example, have moved to MariaDB. It doesn't feature Larry Ellison. So why not address it?

tomc198510y ago

While I love me some PG, SQL Server is nothing to sneeze at. Was fairly impressed with it, never had any serious issues on an under-specced server once occupied by terrible programmers

ris10y ago

"MySQL is popular among a subset of programmers: web developers."

Slapdash web developers.

Most web developers I know now consider postgres the base standard.

fbernier10y ago

I guess it depends on your own personal background, but PostgreSQL is far from 'obscure' and much more feature complete than MySQL nowadays.

zAy0LfpBZLC8mAC10y ago

Why "nowadays"? I can't really remember a time when it wasn't. If anything, MySQL is far less incomplete than it once was (like, it even supports transactions now, which once was one of the most important reasons for using Postgres over MySQL).

cwbrandsma10y ago

MySQL is considered more "old school" and not great at being standards compliant when installed with default settings. See https://twitter.com/robconery/status/189086889486192640

Also, MySQL is run by Oracle...and Oracle a horrible company run by horrible people that behave horribly, and should never ever be trusted in any form. Any developer relying or Oracle code should not be trusted ever again, and Oracle consultants are the vilest form of life on earth.

Did I mention I don't like Oracle?

morgo10y ago

What you are describing is no longer the default configuration.

Say what you like about oracle, they have been a great steward of MySQL.

thevibesman10y ago

> oracle, they have been a great steward of MySQL

I'm not so sure everyone will agree with reports of closed source regression tests and source histories[1][2] (I know these stories are old)

[1]: http://techcrunch.com/2012/08/18/oracle-makes-more-moves-to-... [2]: https://news.ycombinator.com/item?id=4400797

sbov10y ago

From what I can tell the mainline Linux kernel has code developed by Oracle, so you don't trust any developers that ever rely on Linux?

jlgaddis10y ago

I am in no way a fan of Oracle but, c'mon, that's a little over the top, don't you think?

qaq10y ago

Reminds me of a talk by Bryan Cantrill he talked a bit (I am paraphrasing) that he didn't believe that people could be purely evil and when looking into Larry Ellisson he actually discovered he donated several hundred mil. to a nonprofit (dedicated to prolonging human life). On closer look it turned the sole purpose of that nonprofit was working on prolonging life of one particular human namely Larry Ellisson. (while writing off the donation from taxes to boot :).

2 more replies

ryanobjc10y ago

A silicon valley amateur radio club recently moved their study and test sessions from Oracle to Google, because Oracle changed their non-profit rate to $2000/day for a room. Google doesn't charge as long as a contact employee takes responsibility for the meeting.

So yeah, Oracle, evil to the core.

1 more reply

cwbrandsma10y ago

Absolutely a bit over the top (I have friends that are oracle consultants). But I still don't trust Oracle from a business practices level.

jcoffland10y ago

That's why there's MariaDB.

matthewmacleod10y ago

Postgres is widely regarded to be a good example of an open-source project – technically excellent, with a good codebase and community. MySQL has a history of being less technically sound – although it's certainly improved, I imagine it's left a bit of a legacy. In addition, the purchase of MySQL by Oracle probably makes many developers more skeptical of using it.

IME there isn't really a compelling case to use MySQL over Postgres, unless there's a specific environmental constraint.

dorfsmay10y ago

Your perception of PostgreSQL being obscure in comparison to MySQL is interesting... Most projects I've worked on use pg. My perception is that projects using MySQL do so because of history (project started in php and that was the default stack, it was faster at the time (before transactions), etc...).

MBCook10y ago

Besides the large popularity of Postgres, it seems to me (totally anecdotally) that they are more open about development than MySQL is. Or perhaps better at getting new developments noticed through PR than MySQL.

When a new version of MySQL comes out it's big news, but it seems like lots of the little day-to-day as it gets developed updates in Postgres get posted.

wolf550e10y ago

MySQL is the PHP of relational databases. You can do most things with it, but you don't choose it when you have the option to choose something better.

ddorian4310y ago

Because.... people that use postgresql are ~usually better developers, they care more + post more + upvote more.

misframer10y ago

Do you have evidence that you can share?

ddorian4310y ago

No. Just from all the devs that I've talked to, the ones that know more about rdbms prefer postgresql.

jjawssd10y ago

MySQL vs PostgreSQL - Why you shouldn't use MySQL https://www.youtube.com/watch?v=emgJtr9tIME

Self-explanatory, really.

morgo10y ago

lots of things happening in the MySQL world:

- a couple of weeks ago we released a document store (new protocol + connectors) based on protobufs

- last year 5.7 was released. Many new features, I have a list here: http://www.thecompletelistoffeatures.com

skybrian10y ago

Relative obscurity depends on the community. While there are a lot of MySQL installations, I'm under the impression there are more Postgres fans than MySQL fans on Hacker News these days.

technion10y ago

MySQL gets more than its share of popularity and discussion elsewhere. It's still the only supported Wordpress backend, which means it's the defacto web service backend for a large portion of the Internet. It's hard for more advanced users to find anything interesting and new about it.

twic10y ago

Are the MySQL or MariaDB communities publishing interesting stuff like this? That's a genuine question - i'm no longer a MySQL user, so i don't pay any attention to what's going on myself.

morgo10y ago

Yes. Mysqlserverteam.com links usually do not show up on HN, but we have similar benchmarks showing RO throughout to 1.6m qps on Intel

chris_wot10y ago

Postgres is hardly obscure.

Alex391710y ago

It's because Python tends to be the most common programming language for startups due to being easy to hire for, having good library support for a wide range of use cases, being good for web development, etc. And MySQL is focused on catering to enterprise customers who don't use Python. The last time I heard, they had something like one person working part time on Python support, so the drivers weren't nearly as reliable as the Postgres tooling.

simonw10y ago

Python has had rock-solid, widely used MySQL libraries for well over a decade. I've never seen evidence that MySQL and Python aren't a strong (and popular) combination.

fbernier10y ago

Pretty sure there are more or at least as much startups using Ruby, JS, PHP, Java before Python.

gnarbarian10y ago

seems like that would be pretty tough to get reliable stats on something like that.

http://www.tiobe.com/tiobe_index

rco878610y ago

Err, no. Python probably isn't even in the top 5 of languages startups choose for their main stack.

bnchrch10y ago

Not the most sound method but I was demonstrating the relative popularity of programming languages to my SO the other day using find in the Who's Hiring thread for May and python was the most popular.

Surprised me as well.

1 more reply

gaius10y ago

What is then?

One thing that's always bemused me is that YC doesn't insist on its companies using Lisp :-)

tormeh10y ago

How does PostgreSQL compare to VoltDB?

I'm trying to get a handle on the different databases, and VoltDB sounds exciting, but everyone's talking about PostgreSQL. Then there's Mnesia which I hear is, as all things Erlang, excellent, though it's kinda tied to Erlang.

I know it's hard to say what's best, but what would you say is the best DB for a completely new multilingual project that needs throughput but prioritizes low latency, for example?

Also, VoltDB is licensed under AGPL. Does this mean that it can't be used in commercial projects? Or is it OK as long as the other components are on different servers or similar?

matthewmacleod10y ago

You almost certainly want Postgres, unless you have a compelling, specific use case to use a database optimised for a specific workload, or for some reason Postgres isn't usable in your specific environment. It's a reliable, well-designed general-purpose RDBMS which will scale up pretty well to cope with fairly large workloads; performance and latency will not be problems with a sensible schema.

VoltDB, as an example, is very different: it seems to be designed for simple OLTP workloads. It's an in-memory database, which offers opportunities for impressive performance, but if you have a large amount of data, you'll need a large amount of memory. And horizontal scaling is cool, but cross-partition operations will incur significant overhead

(IANAL, but the AGPL requires network users of software be able to download the source. Since a presumably proprietary application is the client in this case, this isn't likely to be an issue.)

Alex391710y ago

> How does PostgreSQL compare to VoltDB?

If you don't know the difference, you probably want Postgres.

VoltDB is a specialty database for things like high frequency trading. It wouldn't make sense to use for, say, a consumer app or web startup.

davidw10y ago

Yeah, if I start a new project, I default to Postgres: https://journal.dedasys.com/2015/02/21/i-default-to-postgres...

I'd consider something else if I'm really, really sure that it's better suited to whatever niche problem than Postgres, but it'd take a lot of thinking and convincing myself.

SEJeff10y ago

In specific, it is a column store, which is advantageous to do things like real time analytics over millions of data points via streaming market data. This has uses for HFT, but also for anyone who wants to do their own day trading.

Jweb_Guru10y ago

VoltDB isn't a column store. It's designed for serializable OLTP workloads with really fast index updates, neither of which characterize column stores. You may be thinking of one of Stonebraker's other projects, Vertica.

1 more reply

rdtsc10y ago

WhatsApp I hear uses Mnesia. If you use Erlang and can fit everything in memory that does look pretty nice. It integrates right into the language.

lostcolony10y ago

It also came about before CAP was postulated, and does nothing to automatically resolve partitions. It doesn't -die-, but it was clearly built with a "we'll run this in our own data center" mentality. In the event of a partition, it goes immediately to split brain mode, which each side of the partition running separately, allowing both writes and reads, and doing nothing to try and automatically heal the partition, even if your Erlang nodes reconnect.

This isn't necessarily a bad thing; it's a very easy model to reason about, it rarely has an issue if your cluster doesn't span more than one data center, and it lets you know when it happens via events you can subscribe to. But doing anything different, up to and including healing the partition automatically, is left to the user.

There's also a few warts due to its history; disc only tables have a rather small max size (so if you're not expecting everything to be stored in RAM as well, you don't want to use Mnesia), indexes are shockingly inefficient for writes, and a few other odds and ends that I don't really remember. For persisted, but fast, storage in Erlang, where partitions aren't common, it's great, but outside of those sorts of use cases there's probably something better.

ddorian4310y ago

Mongodb is also in agpl. But the drivers probably aren't. So you can use it and be fine. And if you make changes to VoltDb you have to share them.

11.5 How much data can be stored in Mnesia?

Dets uses 32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb.

simoncion10y ago

> Dets uses 32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb.

Ehhhhhhhh, kinda, but not really. A couple of things: [0]

* If you use a disc_only_copies table, your max table size is ~2GB because that's apparently the largest file that DETS will work with. What's more, if your table grows to ~2GB, future writes (to that table) will silently fail until the table size is reduced.(!!!) You can kinda get around this limitation by sharding your table [1], but the hash used to determine what key goes to which shard isn't very balanced... sometimes you luck out and your shards are pretty much all the same size. Other times you end up with a few outsized shards.

* However! Everyone on the erlang-questions mailing list says that disc_copies and ram_copies tables are _NOT_ subject to this limit, so they can grow arbitrarily large.

* And, the general consensus seems to be that you really don't want to be using Mnesia in disc_only_copies mode... if you have too much data to fit into RAM, you should probably consider using something else to store your data. This isn't to say that using Mnesia in disc_only mode is bad or hazardous, [2] but that it's substantially slower than disc_copies or ram_copies, and you run the risk of bumping up against the DETS file size limitation.

[0] All observations valid for Erlang 17->18 only. Check Erlang release notes to see if major changes have occurred.

[1] Mnesia handles sharded tables really well, even if the documentation for the feature isn't the best.

[2] I use it in disc_only mode for one of my projects... it's how I learned about that mode's limitations. :p

tormeh10y ago

>32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb

Do they mean 4GB? Surely 0.5GB/4Gb is a bit small even for 32bit?

simoncion10y ago

> Do they mean 4GB [rather than 4Gb]?

"No one" measures sizes that aren't network throughput numbers in bits. "Everyone" uses bytes. :)

And I mean -honestly- if you were shooting for the Pedant badge, you should have also quibbled about GB vs GiB. ;)

1 more reply

billrobertson4210y ago

It's easier to think of Mnesia as more of a persistent distributed hash map rather than a full fledged database.

Ono-Sendai10y ago

Sounds like the padding stuff is a false sharing issue. They might want to look into putting 128 bytes of padding between the data structures as well: http://www.forwardscattering.org/post/29

anarazel10y ago

That's what the discussed padding patch does (except to only padding to 64bytes on most platforms). There's a downside though - on low concurrency the padding reduces the cache hit ratio sufficiently enough to cause a slowdown.

Given we're in code freeze anyway, I've not spent a lot of though on that yet; but I suspect that rearchitecting things so the lines are dirtied fewer times, is the better fix; with less potential for regressions at lower client counts.

Jweb_Guru10y ago

Would it be possible to dynamically choose the padding at server start time? Given that cache line sizes vary that seems like it might be a prudent choice.

anarazel10y ago

> Would it be possible to dynamically choose the padding at server start time?

I doubt it. Allowing the compiler to generate accesses with lea et al. is quite beneficial; and that'd likely be gone by making this not be a compile time constant.

It'd also end up being a tuning knob very very few knew how to tune...

> Given that cache line sizes vary that seems like it might be a prudent choice.

They usually only vary between architectures. Netburst IIRC was the last time x86 cache line sizes varied. If there's any doubt it's usually ok to just use the higher (128 byte) line size, the "unused" padding cache-line will never be accessed and thus not occupy cache space.

1 more reply

jlgaddis10y ago

Man, how I wish WordPress had originally chosen to use PostgreSQL instead of MySQL back in the day.

chc10y ago

AFAIK WordPress was based on a stack that worked well on shared hosts of its day. They didn't really choose a lot of that stuff so much as have it chosen for them by the hosting community.

the-dude10y ago

http://www.hawkix.net/pgsql-for-wordpress/

https://wordpress.org/plugins/postgresql-for-wordpress/

nisa10y ago

    Compatible up to: 3.4.2
    Last Updated: 2 years ago
    Active Installs: 400+

WordPress 4.5 is current. Anyone doing WordPress will just skip this if they don't have developers that can fix the issues. Also while the base may work a lot of popular plugins (used to?) don't utilize WP_Query or whatever else WordPress offers.

It's also probably easier to add indexes to the code or rewrite the logic. Mostly it's bad database code that sucks on MySQL and would likely suck the same way on PostgreSQL.

Besides that if you use an object cache like memcache or redis you can avoid a lot of database accesses and InnoDB seems to be able to deal with concurrent tables like wp_comment.

Would be cool to see in WordPress itself but I doubt it.

jcoffland10y ago

Why? WordPress would not be any better or easier to use unless you already had PostgreSQL installed. Frankly for a use case as simple as WordPress you don't need to over optimize your database.

jrapdx310y ago

> WordPress would not be any better or easier to use unless you already had PostgreSQL installed.

Sounds like a situation I'm facing: assisting a non-profit org that wants a new website. Some members push for using Wordpress as the CMS. However the org has an established pgsql db containing lots of data that optimally could be employed on the site, e.g., lists of members/events, searchable documents, etc.

Also, there's a separate web app for accessing the database using SQL features mysql doesn't support. Lack of Wordpress/pgsql compatibility makes integrating the existing db and folding in the webapp functionality just about impossible, at least I'm not seeing how that could be done.

Since Wordpress isn't going to embrace pgsql, I've recommended considering alternative CMSs. Decisions are still pending.

jordanthoms10y ago

It is possible to have plugins use their own DB and have things call out to that to retrieve info - e.g. CiviCRM uses it's own database to store CRM info (although it's also MySQL) and can provide forms for Wordpress pages etc.

You can also do some pretty nice data syncing/access between the two using Postgres' Foreign Data Wrappers to access MySQL data within Postgres - unfortunately it doesn't look like MySQL has a equivalent for accessing Postgres data from within MySQL.

hinkley10y ago

Always nice to hear about throughput improvements in Postgres.

How do these changes affect more heterogeneous workflows, of mixed reads and writes? A little better? A lot better? A little worse?

anarazel10y ago

Very dependent on the workload. The optimization isn't specific to reads or writes, but in many cases your bottleneck when writing will be elsewhere.

hbrid10y ago

Postgres is still going through the motions of a transaction for every query you issue it even if nothing else but that transaction is happening on the server. So obviously if you add extra load in the form of writes, you may slow your reads down, but this was not a full benchmark, but instead a comparison of the same workload running against multiple versions of Postgres.

Tostino10y ago

Huh, that's funny to see a comment of mine copied word for word from Reddit: https://www.reddit.com/r/programming/comments/4in70l/postgre...

znpy10y ago

Dumb question: in this context, "TPS" means... ?

T<what?> Per Second?

mrjsson10y ago

Transactions Per Second

znpy10y ago

Thanks :)

ionheart10y ago

Big thanks for the community for the hardwork! I wonder if we set the "sync=off" in the test, will it be way higher than the OP results?

jeltz10y ago

Since this is a read-only benchmark it won be affected by either synchronous_commit=off or fsync=off (do not turn off fsync for any data you care about, it can be silently corrupted on a crash!).

malloryerik10y ago

Any opinions about AWS' SQL database, Aurora?

pritambarhate10y ago

These 2 articles on Aurora by Vadim Tkachenko, the CTO and co-founder of Percona are very informative - https://www.percona.com/blog/2015/11/16/amazon-aurora-lookin... and https://www.percona.com/blog/2015/12/03/amazon-aurora-sysben...

gnarbarian10y ago

The more efficient we can be at completing TPS reports the better. I must spend upwards of 40% of my time on them.

gaius10y ago

Did you get the memo?

juhq10y ago

Ah! Yeah. It's just we're putting new coversheets on all the TPS reports before they go out now. So if you could go ahead and try to remember to do that from now on, that'd be great. All right!

gnarbarian10y ago

No fun allowed here.

j / k navigate · click thread line to collapse

210 comments

_w3e810y ago

Postgres has been my DB of choice for nearly a decade. The only times I wind up working with another db are because:

(1 it is a better technical fit for a very specific problem

(2 there is already a legacy db in place

I have been voted down at a couple of startups that wanted to run a "MEAN" stack, invariably all of those startups moved from MongoDB or shutdown.

Thankfully all the ORM's that are worth using support MySQL and Postgres, so using both is very doable.

### Useful Postgres (or SQL in general) tools/libraries :

Bookshelf ORM http://bookshelfjs.org/

PostgREST automated REST API https://github.com/begriffs/postgrest

Sqitch git style migration management http://sqitch.org/

pjmlp10y ago

I have been using it a lot lately, and do like it.

It isn't a popular opinion on HN, but I will still advise for Oracle or SQL Server in terms of tooling, cluster scaling, server side programming and DB drivers.

Then again, we work with customers whose Oracle and SQL Server licenses costs aren't an issue.

Intermernet10y ago

Have an upvote, I'm not sure why you're being down-voted.

I personally try to avoid them (due to the cost, and the lock-in) but they are astoundingly performant and featureful RDBMSs with a huge amount of support and documentation behind them.

pjmlp10y ago

Thanks for the upvote.

I guess some took it personally, although I mentioned I do like Postgres.

I just happen to like the other ones even more, since I was lucky to be able to use them in a few projects.

1 more reply

antb12310y ago

1) Coming to a discussion on Postgres and saying Oracle is better is clearing trolling

2) "server side programming" - as you have the source code with Postgresql and their a plugins for most major programming languages I don't buy this.

3) If you of spent the same on Enterprisedb or CitusDB I can guarantee you would get similar polish and support. People don't.

hackmiester10y ago

Can you explain how the comment was trolling? It seemed to be a legitimate comparison.

gshulegaard10y ago

As others have mentioned, with Citus Data open sourcing CitusDB, Postgres is now capable of fairly easily handling Mongo sized records.

Something else you might be interested in:

https://www.linkedin.com/pulse/mongodb-32-now-powered-postgr...

I have always been a Postgres fan, but now I find it very difficult to imagine a problem MongoDB would be better suited for.

nileshtrivedi10y ago

Also check Postgraphql: https://github.com/calebmer/postgraphql

Inspired from Postgrest linked above, it automatically builds a GraphQL API by reflecting on postgres schema.

gjolund10y ago

I was looking at that the other day, really exciting project.

bulkan10y ago

There is also Sequelize [1] which has more activity

1 - http://docs.sequelizejs.com/en/latest/

Keepo10y ago

1 - http://github.com/Vincit/objection.js

2 - http://knexjs.org

hibearpanda10y ago

We've been using Objection over bookshelf recently and we really like it. One thing that caused us to choose Objection over bookshelf is because bookshelf doesn't support composite keys:

https://github.com/tgriesser/bookshelf/issues/720

te_chris10y ago

I haven't worked with Sequelize personally, but a friend has been recently and curses the day it was born - he wishes he'd used bookshelf.

iyn10y ago

Can you ask your friend about the exact details? From what I've seen it seemed OK.

ap2221310y ago

The only issue that I have with this top-most comment is that it presents PG as a silver bullet. But, there are a lot of different types of databases for a reason.

For instance, at my current startup, we employ at least 7 different databases (including PG). And, I don't say that to brag - each has a specific use for the problem at hand.

You have to consider the needs and trade offs of your specific project. And, if you can, try to isolate your storage behind some interface. Because often your needs will change.

(I say this with 20+ years of experience and over a dozen commercially-successful products in my belt)

jeffdavis10y ago

Not disagreeing necessarily, but having many different databases has a cost.

I think postgres is a good default, because it is good for a lot of things. But sure, if you make an informed decision otherwise, there's nothing wrong with that.

frostymarvelous10y ago

I think it's more because Postgresql is a good starting point. You can start with it and gradually introduce special/niche databases where required.

It doesn't matter whether you have your needs clearly defined, Postgresql is a jack of all trades and a master of many.

gjolund10y ago

I agree with you 100%.

An abstracted interface for your datalayer is a must have.

Many startups and projects begin with a single db and grow into new dbs as the business requirements change.

Obviously if you know your data model well enough you can foresee a lot of these requirements and pick the right tool for the job.

However, you will usually need to pick a db to start with and hope that it will accommodate as many of those unknowns as possible.

I believe that Postgres is the best choice in that scenario, as it is extremely mature and has an incredible amount of flexibility.

ap2221310y ago

Totally agree. In 95% of the cases, you won't go wrong starting with PG.

frostymarvelous10y ago

Funny. I just said the same thing before reading your comment.

Somehow that makes me feel better that at least, I'm on the right path.

takeda10y ago

It is true, that you should not chose blindly, but a majority of use cases do require a relational database. You use one of specialized databases when you have a specialized use case.

manigandham10y ago

> invariably all of those startups moved from MongoDB

Why? Especially after point #1 and assuming the document-store was a good fit for the data model.

takeda10y ago

Because NoSQL is a hype.

NoSQL still makes sense in many cases (generally when your specific use case does not need all guarantees of ACID), you can get in return higher performance or horizontal scalability.

[1] https://en.wikipedia.org/wiki/Hierarchical_database_model

[2] https://en.wikipedia.org/wiki/Object_database

[3] https://en.wikipedia.org/wiki/XML_database

devishard10y ago

MongoDB is really, really bad. I've never come across another product that was so horrible and yet so widely used.

MongoDB: for when you don't need consistency, availability, or partition tolerance.

2 more replies

LoSboccacc10y ago

> so far nothing is better than relational model

my favorite quote: "the relational model is rooted depth in the set theory and math is hard to fool"

it was on a post on how all NoSQL are ending up having to reinvent having, group and join clauses on their API because that's what apps do with data

2 more replies

stickfigure10y ago

Currently MongoDB is outperformed by Postgres

We still love (and prefer) Postgres but it is not a pareto improvement. There are always tradeoffs, and this kind of fanboyism just speaks to inexperience.

4 more replies

TheMiller10y ago

> It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.

This sounds exactly like what MySQL development looked like to me early in its rise to popularity.

jeffdavis10y ago

I conjecture that it's rare that someone just needs a document DB.

The main problem someone is thinking about may be solved by a document DB, but then they also have 1000 other problems they didn't think about that are a horrible fit for a document DB.

And due to the nature of databases, where data has greater value when combined, using a new database system for every application isn't a great option.

phs250110y ago

IMHO, unless you have a situation such that Mongo's horizontal scaling is actually required, it's better to use Postgres even if you're doing document-store stuff.

SEJeff10y ago

http://instagram-engineering.tumblr.com/post/10853187575/sha... it certainly can be done with postgres, but isn't as simple.

stubish10y ago

oconnor66310y ago

http://www.sarahmei.com/blog/2013/11/11/why-you-should-never...

elmigranto10y ago

I don't like this one too much, because Mongo is wrong choice here domain wise, but this one was really convincing for me:

http://cryto.net/~joepie91/blog/2015/07/19/why-you-should-ne...

gjolund10y ago

diaspora was such an interesting project, I dream about resurrecting it often.

gjolund10y ago

Postgres is a better document store than Mongo.

rpedela10y ago

Why not use PostGIS for geo-spatial data?

1 more reply

m_mueller10y ago

How would you handle replicating a DB to mobile devices? This is the reason why I've been using CouchDB, but if Postgres or a plugin offered something comparable I'd have gone for it for sure.

virmundi10y ago

m_mueller10y ago

completely depends on the app. It's a PaaS, not just a specific app.

gjolund10y ago

Honestly that's not something I would handle at the db layer. I would build a service that is responsible for keeping client db's in sync and abstract that away from my database entirely.

m_mueller10y ago

I'm not a fan of rolling our own sync code if I don't have to.

1 more reply

jmnicolas10y ago

Are you H2database author ?

m_mueller10y ago

Nah, my project is called Protogrid. [1]

[1] http://protogrid.com

eventemitter10y ago

Relared ORM is another orm supporting postgres and mysql. It builds the models by itself by loading the required definitions from the database. https://www.npmjs.com/package/related

gjolund10y ago

I'll have to give that a shot, seems like a nice compromise between fully automated solutions like Postgrest and a conventional ORM.

EGreg10y ago

Just out of curiosity, why when wordpress is involved? Doesn't wordpress support PG? What about drupal and other frameworks, would you say the same?

Our framework only has a mysql adapter for now, but it should be pretty easy to add a postgres one.

creshal10y ago

scurvy10y ago

1 more reply

__float10y ago

What do you mean by MySQL avoiding an additional database dependency?

charliedevolve10y ago

I think he meant that MySQL is the only supported DB for Wordpress, and while you might be able to get it running on another DB after jumping through some hoops, it's probably not worth the effort.

gleenn10y ago

IIRC PHP has built-in support for MySQL out of the box.

goldbrick10y ago

Well, technically it's a "PHP extension", just one that several distributions include by default.

gjolund10y ago

^^^

jasonmp8510y ago

AbacusAvenger10y ago

I wonder, does Intel's TSX/HLE help with these workloads? If it's read-only then I'd expect that it'd be able to elide a lot of the locking (assuming the Intel-designed heuristics do the job).

anarazel10y ago

I'd bought one of the first haswell notebooks to play around with tsx. Before I'd time to do so Intel found the tsx bug...

I hope to have time to play around with it one I have new hardware (I refuse to do performance development on virtual).

But honestly, most remaining performance/scalability problems in pg are more algorithmically caused. So micro optimization, and that's what I'd call tax/hle, aren't likely to biggest bottleneck.

ziedaniel110y ago

Wasn't TSX only enabled on Xeons to begin with?

1 more reply

creshal10y ago

I could've sworn that Intel actually demonstrated TSX on a Postgres fork they created, but my Google-fu is too weak to find the slides again.

snhkicker10y ago

I don't know much but to me it seems PostgreSQL is probably one of the most open and supporting communities maybe this is reason alot of new faces are looking at it including me.

morgante10y ago

It's also a great example of a technology which is mature and well-tested yet actively growing and improving. Most open source projects have a lot to learn from Postgres.

movedx10y ago

What would you say those learning points are? What key factors would you introduce into another OSS project that PostgreSQL currently employs?

stubish10y ago

tacos10y ago

They were my introduction to open source -- and, well, let's just say I wish every project were so well-run. One of the best for sure.

graffitici10y ago

Are there any best practices for using PostgreSQL for storing time series data? Would it be comparable in performance to some of the NoSQL solutions (like Cassandra) for reasonable loads?

SEJeff10y ago

It can also count Alibaby, eBay, Cisco, Paypal, Yahoo, and Netflix as users (amongst many others): http://druid.io/druid-powered.html

It is really impressive tech. Bonus points that some of the original founders of Druid from Metamarkets just founded a company to do enterprise support around it:

http://imply.io/post/2015/10/19/announcing-imply.html

olavgg10y ago

PostgreSQL's biggest weakness at the moment is aggregating data by using several cpu workers/cores. This is coming in PostgreSQL 9.6

Oh and I run PostgreSQL on ZFS with LZ4 compression,

takeda10y ago

> Oh and I run PostgreSQL on ZFS with LZ4 compression

I'm generally a big advocate of ZFS, but I heard that COW file systems (ZFS and btrfs) are generally not good choice for a database workload.

How does it perform for you?

olavgg10y ago

It performs as well as any other file system, but ZFS shines when you are IO bound as compression really helps when you do sequential scans.

lfittl10y ago

Do you mean timeseries data as in IoT/sensor data/etc, DevOps monitoring metrics (i.e. server load, app performance, etc), or something else?

Curious since I'm currently researching how PostgreSQL could do better in this space :)

takeda10y ago

Not the TP, but I personally am interested in the later one (metrics).

There doesn't seem to be any silver bullet yet. And it is also hard to even see how relational database compares to the existing solutions, since most people dismiss it immediately.

goldbrick10y ago

pzaich10y ago

[1] http://stackoverflow.com/questions/17342176/max-distinct-and...

yeukhon10y ago

Datadog uses PostgreSQL. See [1].

[1]: http://www.enterprisedb.com/success-stories/postgresql-succe...

thom_nic10y ago

Unless PG has some timeseries-specific extensions I have assumed it would be appropriate for a TS-specific database. Also curious to try Riak TS.

siculars10y ago

/plug/ Riak TS was just released open source that does just this. I wrote a post on it, http://basho.com/posts/technical/time-series-the-new-shiny/ .

Disclaimer: I work for Basho, makers of Riak TS.

Jweb_Guru10y ago

siculars10y ago

Disclaimer: I work for Basho, makers of riak ts.

bnchrch10y ago

devy10y ago

     In partnership with IBM we researched PostgreSQL scalability on modern Power8 servers.

That statement and the linked Russian blog white paper[1] makes it seem like a Power8 specific and Power8 is a "a massively multithreaded chip"[2]. I wonder how far off it would be to x86-64?

[1] https://habrahabr.ru/company/postgrespro/blog/270827/

[2] https://en.wikipedia.org/wiki/POWER8

dimfeld10y ago

Also, the chart indicates that the benchmarks shown were run on Xeon chips.

jrcii10y ago

esaym10y ago

There's been a large migration off of mysql to postgres simply because Oracle got the rights to mysql when they purchased Sun. That is what made me consider, ditching mysql.

simoncion10y ago

> From what I learned, mysql's purpose was to stop people from using text files as storage points in the early 2000's.

Well shit, SQLite has that market sewn up these days.

orf10y ago

> after spending a few weeks on this project I discovered that mysql only supported a small subset of SQL

Did you not research this beforehand?

esaym10y ago

threeseed10y ago

> There's been a large migration off of mysql to postgres

No there hasn't:

http://db-engines.com/en/ranking

majewsky10y ago

nathan_long10y ago

What benefits do you see that MySQL has over PostgreSQL? What defines its niche?

narrator10y ago

ec10968510y ago

If it can fit on a single machine, those technologies are not necessary.

combatentropy10y ago

> why nearly every RDBMS post on Hacker News is about Postgres and almost never MySQL

You'll see the same phenomenon on Slashdot.

manigandham10y ago

> Postgres, but that anyone uses Microsoft or Oracle at all. Not only do they cost a lot, but from a purely technical standpoint they are worse.

Based on what? Both commercial databases have incredible features, tooling and extensions that leave postgres behind. Postgres today doesn't even have a solid scale-up or scale-out strategy.

It does have nice SQL support and makes developer lives a little easier but this isnt anywhere close to making it the absolute winner technically.

e12e10y ago

I'm curious what you mean by Postgres not having a "scale-up" strategy. Are you saying that eg MS SQL works better with a few TBs of ram and 128 cores than Postgres does?

1 more reply

jeltz10y ago

The very article you are commenting on is about scaling up with PostgreSQL. PostgreSQL is excellent at scaling up with many small queries, but has problems with few large ones.

2 more replies

breakingcups10y ago

Could you elaborate a bit more on why you view Microsoft SQL Server as worse?

combatentropy10y ago

If you use MSSQL all the time, I have no doubt that you get along fine. You can get used to its ways, and most applications don't need features that aren't shared by almost all databases.

brisance10y ago

gaius10y ago

Consider that PG is adding wait events in the upcoming 9.6 release, a feature Oracle has had for about 20 years...

ptaipale10y ago

I'm curious abour another thing: like the previous comment asked about MySQL and MariaDB, and your answer cuts away MariaDB from the quote.

However, the trend seems to be from MySQL to MariaDB. Red Hat and many other Linux distros, for example, have moved to MariaDB. It doesn't feature Larry Ellison. So why not address it?

tomc198510y ago

While I love me some PG, SQL Server is nothing to sneeze at. Was fairly impressed with it, never had any serious issues on an under-specced server once occupied by terrible programmers

ris10y ago

"MySQL is popular among a subset of programmers: web developers."

Slapdash web developers.

Most web developers I know now consider postgres the base standard.

fbernier10y ago

I guess it depends on your own personal background, but PostgreSQL is far from 'obscure' and much more feature complete than MySQL nowadays.

zAy0LfpBZLC8mAC10y ago

cwbrandsma10y ago

MySQL is considered more "old school" and not great at being standards compliant when installed with default settings. See https://twitter.com/robconery/status/189086889486192640

Did I mention I don't like Oracle?

morgo10y ago

What you are describing is no longer the default configuration.

Say what you like about oracle, they have been a great steward of MySQL.

thevibesman10y ago

> oracle, they have been a great steward of MySQL

I'm not so sure everyone will agree with reports of closed source regression tests and source histories[1][2] (I know these stories are old)

[1]: http://techcrunch.com/2012/08/18/oracle-makes-more-moves-to-... [2]: https://news.ycombinator.com/item?id=4400797

sbov10y ago

From what I can tell the mainline Linux kernel has code developed by Oracle, so you don't trust any developers that ever rely on Linux?

jlgaddis10y ago

I am in no way a fan of Oracle but, c'mon, that's a little over the top, don't you think?

qaq10y ago

2 more replies

ryanobjc10y ago

So yeah, Oracle, evil to the core.

1 more reply

cwbrandsma10y ago

Absolutely a bit over the top (I have friends that are oracle consultants). But I still don't trust Oracle from a business practices level.

jcoffland10y ago

That's why there's MariaDB.

matthewmacleod10y ago

IME there isn't really a compelling case to use MySQL over Postgres, unless there's a specific environmental constraint.

dorfsmay10y ago

MBCook10y ago

When a new version of MySQL comes out it's big news, but it seems like lots of the little day-to-day as it gets developed updates in Postgres get posted.

wolf550e10y ago

MySQL is the PHP of relational databases. You can do most things with it, but you don't choose it when you have the option to choose something better.

ddorian4310y ago

Because.... people that use postgresql are ~usually better developers, they care more + post more + upvote more.

misframer10y ago

Do you have evidence that you can share?

ddorian4310y ago

No. Just from all the devs that I've talked to, the ones that know more about rdbms prefer postgresql.

jjawssd10y ago

MySQL vs PostgreSQL - Why you shouldn't use MySQL https://www.youtube.com/watch?v=emgJtr9tIME

Self-explanatory, really.

morgo10y ago

lots of things happening in the MySQL world:

- a couple of weeks ago we released a document store (new protocol + connectors) based on protobufs

- last year 5.7 was released. Many new features, I have a list here: http://www.thecompletelistoffeatures.com

skybrian10y ago

Relative obscurity depends on the community. While there are a lot of MySQL installations, I'm under the impression there are more Postgres fans than MySQL fans on Hacker News these days.

technion10y ago

twic10y ago

Are the MySQL or MariaDB communities publishing interesting stuff like this? That's a genuine question - i'm no longer a MySQL user, so i don't pay any attention to what's going on myself.

morgo10y ago

Yes. Mysqlserverteam.com links usually do not show up on HN, but we have similar benchmarks showing RO throughout to 1.6m qps on Intel

chris_wot10y ago

Postgres is hardly obscure.

Alex391710y ago

simonw10y ago

Python has had rock-solid, widely used MySQL libraries for well over a decade. I've never seen evidence that MySQL and Python aren't a strong (and popular) combination.

fbernier10y ago

Pretty sure there are more or at least as much startups using Ruby, JS, PHP, Java before Python.

gnarbarian10y ago

seems like that would be pretty tough to get reliable stats on something like that.

http://www.tiobe.com/tiobe_index

rco878610y ago

Err, no. Python probably isn't even in the top 5 of languages startups choose for their main stack.

bnchrch10y ago

Surprised me as well.

1 more reply

gaius10y ago

What is then?

One thing that's always bemused me is that YC doesn't insist on its companies using Lisp :-)

tormeh10y ago

How does PostgreSQL compare to VoltDB?

I know it's hard to say what's best, but what would you say is the best DB for a completely new multilingual project that needs throughput but prioritizes low latency, for example?

Also, VoltDB is licensed under AGPL. Does this mean that it can't be used in commercial projects? Or is it OK as long as the other components are on different servers or similar?

matthewmacleod10y ago

(IANAL, but the AGPL requires network users of software be able to download the source. Since a presumably proprietary application is the client in this case, this isn't likely to be an issue.)

Alex391710y ago

> How does PostgreSQL compare to VoltDB?

If you don't know the difference, you probably want Postgres.

VoltDB is a specialty database for things like high frequency trading. It wouldn't make sense to use for, say, a consumer app or web startup.

davidw10y ago

Yeah, if I start a new project, I default to Postgres: https://journal.dedasys.com/2015/02/21/i-default-to-postgres...

I'd consider something else if I'm really, really sure that it's better suited to whatever niche problem than Postgres, but it'd take a lot of thinking and convincing myself.

SEJeff10y ago

Jweb_Guru10y ago

1 more reply

rdtsc10y ago

WhatsApp I hear uses Mnesia. If you use Erlang and can fit everything in memory that does look pretty nice. It integrates right into the language.

lostcolony10y ago

ddorian4310y ago

Mongodb is also in agpl. But the drivers probably aren't. So you can use it and be fine. And if you make changes to VoltDb you have to share them.

11.5 How much data can be stored in Mnesia?

Dets uses 32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb.

simoncion10y ago

> Dets uses 32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb.

Ehhhhhhhh, kinda, but not really. A couple of things: [0]

* However! Everyone on the erlang-questions mailing list says that disc_copies and ram_copies tables are _NOT_ subject to this limit, so they can grow arbitrarily large.

[0] All observations valid for Erlang 17->18 only. Check Erlang release notes to see if major changes have occurred.

[1] Mnesia handles sharded tables really well, even if the documentation for the feature isn't the best.

[2] I use it in disc_only mode for one of my projects... it's how I learned about that mode's limitations. :p

tormeh10y ago

>32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb

Do they mean 4GB? Surely 0.5GB/4Gb is a bit small even for 32bit?

simoncion10y ago

> Do they mean 4GB [rather than 4Gb]?

"No one" measures sizes that aren't network throughput numbers in bits. "Everyone" uses bytes. :)

And I mean -honestly- if you were shooting for the Pedant badge, you should have also quibbled about GB vs GiB. ;)

1 more reply

billrobertson4210y ago

It's easier to think of Mnesia as more of a persistent distributed hash map rather than a full fledged database.

Ono-Sendai10y ago

Sounds like the padding stuff is a false sharing issue. They might want to look into putting 128 bytes of padding between the data structures as well: http://www.forwardscattering.org/post/29

anarazel10y ago

Jweb_Guru10y ago

Would it be possible to dynamically choose the padding at server start time? Given that cache line sizes vary that seems like it might be a prudent choice.

anarazel10y ago

> Would it be possible to dynamically choose the padding at server start time?

I doubt it. Allowing the compiler to generate accesses with lea et al. is quite beneficial; and that'd likely be gone by making this not be a compile time constant.

It'd also end up being a tuning knob very very few knew how to tune...

> Given that cache line sizes vary that seems like it might be a prudent choice.

1 more reply

jlgaddis10y ago

Man, how I wish WordPress had originally chosen to use PostgreSQL instead of MySQL back in the day.

chc10y ago

AFAIK WordPress was based on a stack that worked well on shared hosts of its day. They didn't really choose a lot of that stuff so much as have it chosen for them by the hosting community.

the-dude10y ago

http://www.hawkix.net/pgsql-for-wordpress/

https://wordpress.org/plugins/postgresql-for-wordpress/

nisa10y ago

    Compatible up to: 3.4.2
    Last Updated: 2 years ago
    Active Installs: 400+

It's also probably easier to add indexes to the code or rewrite the logic. Mostly it's bad database code that sucks on MySQL and would likely suck the same way on PostgreSQL.

Besides that if you use an object cache like memcache or redis you can avoid a lot of database accesses and InnoDB seems to be able to deal with concurrent tables like wp_comment.

Would be cool to see in WordPress itself but I doubt it.

jcoffland10y ago

Why? WordPress would not be any better or easier to use unless you already had PostgreSQL installed. Frankly for a use case as simple as WordPress you don't need to over optimize your database.

jrapdx310y ago

> WordPress would not be any better or easier to use unless you already had PostgreSQL installed.

Since Wordpress isn't going to embrace pgsql, I've recommended considering alternative CMSs. Decisions are still pending.

jordanthoms10y ago

hinkley10y ago

Always nice to hear about throughput improvements in Postgres.

How do these changes affect more heterogeneous workflows, of mixed reads and writes? A little better? A lot better? A little worse?

anarazel10y ago

Very dependent on the workload. The optimization isn't specific to reads or writes, but in many cases your bottleneck when writing will be elsewhere.

hbrid10y ago

Tostino10y ago

Huh, that's funny to see a comment of mine copied word for word from Reddit: https://www.reddit.com/r/programming/comments/4in70l/postgre...

znpy10y ago

Dumb question: in this context, "TPS" means... ?

T<what?> Per Second?

mrjsson10y ago

Transactions Per Second

znpy10y ago

Thanks :)

ionheart10y ago

Big thanks for the community for the hardwork! I wonder if we set the "sync=off" in the test, will it be way higher than the OP results?

jeltz10y ago

Since this is a read-only benchmark it won be affected by either synchronous_commit=off or fsync=off (do not turn off fsync for any data you care about, it can be silently corrupted on a crash!).

malloryerik10y ago

Any opinions about AWS' SQL database, Aurora?

pritambarhate10y ago

gnarbarian10y ago

The more efficient we can be at completing TPS reports the better. I must spend upwards of 40% of my time on them.

gaius10y ago

Did you get the memo?

juhq10y ago

Ah! Yeah. It's just we're putting new coversheets on all the TPS reports before they go out now. So if you could go ahead and try to remember to do that from now on, that'd be great. All right!

gnarbarian10y ago

No fun allowed here.

j / k navigate · click thread line to collapse