(1 it is a better technical fit for a very specific problem
(2 there is already a legacy db in place
I have been voted down at a couple of startups that wanted to run a "MEAN" stack, invariably all of those startups moved from MongoDB or shutdown.
The only time I will advocate for anything other than Postgres is when Wordpress is involved. If the data model is simple enough then MySQL is more than up for the task, and it avoids an additional database dependency.
Thankfully all the ORM's that are worth using support MySQL and Postgres, so using both is very doable.
### Useful Postgres (or SQL in general) tools/libraries :
Bookshelf ORM http://bookshelfjs.org/
PostgREST automated REST API https://github.com/begriffs/postgrest
Sqitch git style migration management http://sqitch.org/
It isn't a popular opinion on HN, but I will still advise for Oracle or SQL Server in terms of tooling, cluster scaling, server side programming and DB drivers.
Then again, we work with customers whose Oracle and SQL Server licenses costs aren't an issue.
If you take the issues of open-source and licensing out of the equation (both important issues in their own right, but not related to the point at hand) then Oracle and SQL Server are both ridiculously good.
I personally try to avoid them (due to the cost, and the lock-in) but they are astoundingly performant and featureful RDBMSs with a huge amount of support and documentation behind them.
2) "server side programming" - as you have the source code with Postgresql and their a plugins for most major programming languages I don't buy this.
3) If you of spent the same on Enterprisedb or CitusDB I can guarantee you would get similar polish and support. People don't.
Something else you might be interested in:
https://www.linkedin.com/pulse/mongodb-32-now-powered-postgr...
I have always been a Postgres fan, but now I find it very difficult to imagine a problem MongoDB would be better suited for.
Inspired from Postgrest linked above, it automatically builds a GraphQL API by reflecting on postgres schema.
For instance, at my current startup, we employ at least 7 different databases (including PG). And, I don't say that to brag - each has a specific use for the problem at hand.
You have to consider the needs and trade offs of your specific project. And, if you can, try to isolate your storage behind some interface. Because often your needs will change.
(I say this with 20+ years of experience and over a dozen commercially-successful products in my belt)
Data has more value when combined, so a wise database choice also depends on what you already have. You can combine data from different systems (e.g. FDWs, which postgres has great support for), but something is usually lost along the way.
I think postgres is a good default, because it is good for a lot of things. But sure, if you make an informed decision otherwise, there's nothing wrong with that.
It doesn't matter whether you have your needs clearly defined, Postgresql is a jack of all trades and a master of many.
An abstracted interface for your datalayer is a must have.
Many startups and projects begin with a single db and grow into new dbs as the business requirements change.
Obviously if you know your data model well enough you can foresee a lot of these requirements and pick the right tool for the job.
However, you will usually need to pick a db to start with and hope that it will accommodate as many of those unknowns as possible.
I believe that Postgres is the best choice in that scenario, as it is extremely mature and has an incredible amount of flexibility.
The reason why Postgres is so popular here is because if you want to have an open source solution, Postgres is the best available. I'm sure there are better proprietary solutions, but even though it is free, PG is pretty darn close to them.
Why? Especially after point #1 and assuming the document-store was a good fit for the data model.
Many of the NoSQL essentially takes us back to 60s before Codd came up with relational model[1] These ideas tend to come back once in a while [2][3], but so far nothing is better than relational model.
NoSQL still makes sense in many cases (generally when your specific use case does not need all guarantees of ACID), you can get in return higher performance or horizontal scalability.
MongoDB is aim to be generic database but rides on the wave of NoSQL "coolness". It was created by people who had no experience in databases and are learning as they go[4]. As they started adding things that are essential for database they realized it's not that simple.
Currently MongoDB is outperformed by Postgres. In fact there is an application called ToroDB which provides protcol compatibility for Postgres that emulates MongoDB and even that outperforms Mongo. Mongo also doesn't scale well horizontally, so essentially you don't really don't get any significant advantage.
[1] https://en.wikipedia.org/wiki/Hierarchical_database_model
[2] https://en.wikipedia.org/wiki/Object_database
[3] https://en.wikipedia.org/wiki/XML_database
[4] For example they started with mmap'ed memory regions to store the data. Did not initially use fsync() to make sure data is saved on disk. In a way it reminds me of MySQL several years ago. It's much better now than it was in the past, but it has a lot of warts left of from the past.
The main problem someone is thinking about may be solved by a document DB, but then they also have 1000 other problems they didn't think about that are a horrible fit for a document DB.
And due to the nature of databases, where data has greater value when combined, using a new database system for every application isn't a great option.
SQL is great for business data, and passable-to-good for everything else. So the only way to beat it is by being way better at some specific thing, and finding users where that specific thing is so important that it's OK if the data is on an island.
IMHO, unless you have a situation such that Mongo's horizontal scaling is actually required, it's better to use Postgres even if you're doing document-store stuff.
The only time I would recommend Mongo is for storing geo-spatial data, as it has several built ins that make it much easier to work with. Even then I find it a lot more convenient to keep that data in Postgres and replicate it to Mongo.
Our framework only has a mysql adapter for now, but it should be pretty easy to add a postgres one.
Generally, if you can, you should consider supporting Postgres in your framework. It's a much saner and robuster database from an Ops point of view (replication doesn't fail as often) and more flexible from a Dev view (performance is generally more predictable, much wider feature set with document storage / PostGIS / etc., …).
I had never touched PostgreSQL before, nor any Linux performance tools, but I noticed that replacing certain buffer eviction locks with atomic implementations could drastically help this particular case. I emailed the list about it and Andres was someone who chimed in with helpful advice. I wrote up what I'd discovered in my deep dive here: http://tiny.cc/postgres-concurrency
Turns out Andres was already working on a "better atomics" patch to provide easier methods of using atomic operations within PostgreSQL's code base (my patch was a quick hack probably only valid on x86, if that). It's been useful in removing several performance bottlenecks and—two years in—it looks like it's still paying off.
I hope to have time to play around with it one I have new hardware (I refuse to do performance development on virtual).
But honestly, most remaining performance/scalability problems in pg are more algorithmically caused. So micro optimization, and that's what I'd call tax/hle, aren't likely to biggest bottleneck.
However, if the data grows beyond what a single scaled up machine can achieve, take a look at druid (druid.io). It is a bit more involved in setting up than influx, but was built from the very beginning to scale out horizontally. As a result, I can do realtime analysis (using grafana) of over 10 billion data points and perform aggregations over said data. It is an incredibly useful tool, and the newly released 0.9 looks ever better.
It can also count Alibaby, eBay, Cisco, Paypal, Yahoo, and Netflix as users (amongst many others): http://druid.io/druid-powered.html
It is really impressive tech. Bonus points that some of the original founders of Druid from Metamarkets just founded a company to do enterprise support around it:
You also have to spend some time to tune the query cost settings to avoid sequential scans if you're only gonna work with a subset of the data. Another optimization could be implementing table inheritance so you have a table for every year. If you work with data sets for a specific year you would get a big performance boost with sequential scans.
PostgreSQL's biggest weakness at the moment is aggregating data by using several cpu workers/cores. This is coming in PostgreSQL 9.6
Oh and I run PostgreSQL on ZFS with LZ4 compression,
I'm generally a big advocate of ZFS, but I heard that COW file systems (ZFS and btrfs) are generally not good choice for a database workload.
How does it perform for you?
Curious since I'm currently researching how PostgreSQL could do better in this space :)
There doesn't seem to be any silver bullet yet. And it is also hard to even see how relational database compares to the existing solutions, since most people dismiss it immediately.
[1] http://stackoverflow.com/questions/17342176/max-distinct-and...
[1]: http://www.enterprisedb.com/success-stories/postgresql-succe...
Unless PG has some timeseries-specific extensions I have assumed it would be appropriate for a TS-specific database. Also curious to try Riak TS.
Disclaimer: I work for Basho, makers of Riak TS.
Disclaimer: I work for Basho, makers of riak ts.
In partnership with IBM we researched PostgreSQL scalability on modern Power8 servers.
That statement and the linked Russian blog white paper[1] makes it seem like a Power8 specific and Power8 is a "a massively multithreaded chip"[2]. I wonder how far off it would be to x86-64?Also, the chart indicates that the benchmarks shown were run on Xeon chips.
The final straw came when I stood up a mysql 5.6 instance to use as a data warehouse for about 5TB of data (15 billion rows). To my horror after spending a few weeks on this project I discovered that mysql only supported a small subset of the SQL language. Most of the needed sql features for data warehouse queries were simply not there (no WITH clause, no ROW_NUMBER(), only one sequence per table, no schema support, and many others that SQL report ninjas love to use)
With that I had to go back to management and convince them to put more time into the project so we could try again with postgresql. There was a large learning curve to go to postgres. It was just different in many areas, but at the same time, reminded me alot of IBM DB2 (both were a product of the 1980's if you look at the history). Postgres fit into our plans nicely, it actually had a full blown intelligent query parser and optimizer unlike mysql. From what I learned, mysql's purpose was to stop people from using text files as storage points in the early 2000's. But once you graduate from that, it's onward to postgres from there.
Well shit, SQLite has that market sewn up these days.
Did you not research this beforehand?
No there hasn't:
http://db-engines.com/en/ranking
Which makes sense since they aren't really playing in the same space. The benefits of PostgreSQL are largely lost on typical MySQL use cases (light load, simple CRUD access patterns, limited use of JSON/BSON etc).
You'll see the same phenomenon on Slashdot.
MySQL is popular among a subset of programmers: web developers. In corporations, Microsoft SQL and Oracle are more popular. Further, MySQL is popular among a subset of web developers: those who use PHP. Among web developers who use Python, Postgres seems more popular.
My suspicion is that MySQL's popularity is tied to those web-hosting plans (1 GB of storage! 1 TB of bandwidth! 10 databases! $10/month!). They were almost always MySQL databases. Web hosts that offered Postgres databases were few and far between. This article corroborates my theory: http://rodner.blogspot.com/2008/01/what-makes-mysql-so-popul.... MySQL's company (which at the time was "MySQL" not Oracle) pushed it in at a critical time in the development of the web.
I have written applications that use: Postgres, MySQL, SQLite, Microsoft SQL, and Oracle. My favorite by far is Postgres. The strangest thing to me is not that MySQL is more popular than Postgres, but that anyone uses Microsoft or Oracle at all. Not only do they cost a lot, but from a purely technical standpoint they are worse.
Based on what? Both commercial databases have incredible features, tooling and extensions that leave postgres behind. Postgres today doesn't even have a solid scale-up or scale-out strategy.
It does have nice SQL support and makes developer lives a little easier but this isnt anywhere close to making it the absolute winner technically.
Could you elaborate a bit more on why you view Microsoft SQL Server as worse?
A lot of decisions are made holistically instead of a "purely technical standpoint". In large companies where there are still IT departments (e.g. use of Active Directory), MS is still very much the de facto platform, and SQL Server is the "logical" choice.
However, the trend seems to be from MySQL to MariaDB. Red Hat and many other Linux distros, for example, have moved to MariaDB. It doesn't feature Larry Ellison. So why not address it?
Slapdash web developers.
Most web developers I know now consider postgres the base standard.
Also, MySQL is run by Oracle...and Oracle a horrible company run by horrible people that behave horribly, and should never ever be trusted in any form. Any developer relying or Oracle code should not be trusted ever again, and Oracle consultants are the vilest form of life on earth.
Did I mention I don't like Oracle?
Say what you like about oracle, they have been a great steward of MySQL.
IME there isn't really a compelling case to use MySQL over Postgres, unless there's a specific environmental constraint.
When a new version of MySQL comes out it's big news, but it seems like lots of the little day-to-day as it gets developed updates in Postgres get posted.
Self-explanatory, really.
- a couple of weeks ago we released a document store (new protocol + connectors) based on protobufs
- last year 5.7 was released. Many new features, I have a list here: http://www.thecompletelistoffeatures.com
I'm trying to get a handle on the different databases, and VoltDB sounds exciting, but everyone's talking about PostgreSQL. Then there's Mnesia which I hear is, as all things Erlang, excellent, though it's kinda tied to Erlang.
I know it's hard to say what's best, but what would you say is the best DB for a completely new multilingual project that needs throughput but prioritizes low latency, for example?
Also, VoltDB is licensed under AGPL. Does this mean that it can't be used in commercial projects? Or is it OK as long as the other components are on different servers or similar?
VoltDB, as an example, is very different: it seems to be designed for simple OLTP workloads. It's an in-memory database, which offers opportunities for impressive performance, but if you have a large amount of data, you'll need a large amount of memory. And horizontal scaling is cool, but cross-partition operations will incur significant overhead
(IANAL, but the AGPL requires network users of software be able to download the source. Since a presumably proprietary application is the client in this case, this isn't likely to be an issue.)
If you don't know the difference, you probably want Postgres.
VoltDB is a specialty database for things like high frequency trading. It wouldn't make sense to use for, say, a consumer app or web startup.
I'd consider something else if I'm really, really sure that it's better suited to whatever niche problem than Postgres, but it'd take a lot of thinking and convincing myself.
This isn't necessarily a bad thing; it's a very easy model to reason about, it rarely has an issue if your cluster doesn't span more than one data center, and it lets you know when it happens via events you can subscribe to. But doing anything different, up to and including healing the partition automatically, is left to the user.
There's also a few warts due to its history; disc only tables have a rather small max size (so if you're not expecting everything to be stored in RAM as well, you don't want to use Mnesia), indexes are shockingly inefficient for writes, and a few other odds and ends that I don't really remember. For persisted, but fast, storage in Erlang, where partitions aren't common, it's great, but outside of those sorts of use cases there's probably something better.
11.5 How much data can be stored in Mnesia?
Dets uses 32 bit integers for file offsets, so the largest possible mnesia table (for now) is 4Gb.
Ehhhhhhhh, kinda, but not really. A couple of things: [0]
* If you use a disc_only_copies table, your max table size is ~2GB because that's apparently the largest file that DETS will work with. What's more, if your table grows to ~2GB, future writes (to that table) will silently fail until the table size is reduced.(!!!) You can kinda get around this limitation by sharding your table [1], but the hash used to determine what key goes to which shard isn't very balanced... sometimes you luck out and your shards are pretty much all the same size. Other times you end up with a few outsized shards.
* However! Everyone on the erlang-questions mailing list says that disc_copies and ram_copies tables are _NOT_ subject to this limit, so they can grow arbitrarily large.
* And, the general consensus seems to be that you really don't want to be using Mnesia in disc_only_copies mode... if you have too much data to fit into RAM, you should probably consider using something else to store your data. This isn't to say that using Mnesia in disc_only mode is bad or hazardous, [2] but that it's substantially slower than disc_copies or ram_copies, and you run the risk of bumping up against the DETS file size limitation.
[0] All observations valid for Erlang 17->18 only. Check Erlang release notes to see if major changes have occurred.
[1] Mnesia handles sharded tables really well, even if the documentation for the feature isn't the best.
[2] I use it in disc_only mode for one of my projects... it's how I learned about that mode's limitations. :p
Do they mean 4GB? Surely 0.5GB/4Gb is a bit small even for 32bit?
Given we're in code freeze anyway, I've not spent a lot of though on that yet; but I suspect that rearchitecting things so the lines are dirtied fewer times, is the better fix; with less potential for regressions at lower client counts.
Compatible up to: 3.4.2
Last Updated: 2 years ago
Active Installs: 400+
WordPress 4.5 is current. Anyone doing WordPress will just skip this if they don't have developers that can fix the issues. Also while the base may work a lot of popular plugins (used to?) don't utilize WP_Query or whatever else WordPress offers.It's also probably easier to add indexes to the code or rewrite the logic. Mostly it's bad database code that sucks on MySQL and would likely suck the same way on PostgreSQL.
Besides that if you use an object cache like memcache or redis you can avoid a lot of database accesses and InnoDB seems to be able to deal with concurrent tables like wp_comment.
Would be cool to see in WordPress itself but I doubt it.
Sounds like a situation I'm facing: assisting a non-profit org that wants a new website. Some members push for using Wordpress as the CMS. However the org has an established pgsql db containing lots of data that optimally could be employed on the site, e.g., lists of members/events, searchable documents, etc.
Also, there's a separate web app for accessing the database using SQL features mysql doesn't support. Lack of Wordpress/pgsql compatibility makes integrating the existing db and folding in the webapp functionality just about impossible, at least I'm not seeing how that could be done.
Since Wordpress isn't going to embrace pgsql, I've recommended considering alternative CMSs. Decisions are still pending.
How do these changes affect more heterogeneous workflows, of mixed reads and writes? A little better? A lot better? A little worse?
T<what?> Per Second?