Also, the lack of author identity was frowned upon.
Lastly, the conjecture and attitude towards Microsoft lacks some substance.
Conclusion: The author is free to write whatever he likes, but take this resource with a pinch of salt.
I use both Postgres and MS SQL Server professionally and whilst philosophically I prefer Postgres, for practical reasons I truly prefer MS SQL Server, if only because of its excellent development tools.
The other is the barrier to entry. I was trying to get NAV2015 working for a demo using some data for a project, I spent at least 20 hours trying to figure out who to pay (Azure comes with it installed but no license -- why not just sell a bundled license into the hourly cost?) -- eventually gave up and set up OpenERP in like 30 minutes, spent 3 hours adding the feature I needed and was done. NAV is unquestionably better for this project, but I'm not going to enter into a support contract with a company just to get a license.
I'm guessing it comes down to virtual cores and such, right? Although I'm surprised how many stupid-simple mistakes people make just by not reading even the basics. (Wait MSDN includes several Office keys, so obviously they don't want us to buy Office licenses for administrative assistants.)
So there was not any substancial critique and responce to the specific points he makes?
> Unless otherwise stated I am referring to PostgreSQL 9.3 and MS SQL Server 2014
Which makes his opinions timeless and gives context. A date on it would be welcome, but just from the software version you can figure out if it applies to you or your scenario.
MS SQL: - The tools included like Management Studio are just great. This is totally next level to the Postgres tools. - Using multiple CPU cores for a single query is really helpfull in my scenario. - Easy continous backup to the cloud. - Included Integration Services, Analysis services are also easy and enugh for my usage.
Postgres: - Is running on linux, to it's cheaper. Even when we use SQL Server as Bizspark (for free now), the Azure Windows VM for it costs us much more, than VM with Linux. And of course when we want to cluster our DB, Postgres is even cheaper. - Great JSON support. There are parts of our project where it's helpfull. - Better configurability, like WAL, checkpoints etc. We have much better write performance on postgres than in sql server (probably just our case).
The other things really do not much difference. Both DB's can be extended, and extensions may be written in many languages. Both achieve great overall performance, both have strong community and a lot of documentation.
So yeah, it'll have better INSERT performance, but the queries will be slower. There's really no way around that. A large data set on disk that's out of order will always be slower than the one that's in-order.
IMO MSSQL makes the right call for the vast majority of use-cases.
PostgreSQL has -no- materialized views (I stand corrected! Introduced in v9.3). No view update support. No partitioned view support. No sane backup/restore process. It's a great database if your primary concern is licensing cost. But if your primary concern is operational cost and even just multi-gigabyte data sets it's really frustratingly rudimentary compared to what MSSQL delivered over a decade ago.
But that's just me. It's free. And I'm thankful for that. I just find it really frustrating that PostgreSQL supports querying on JSON, but doesn't support backing up and restoring the database in binary format.
9.3
> No partitioned view support
Inheritance (ish)
> No sane backup/restore process
wat
> doesn't support backing up and restoring the database in binary format
http://www.postgresql.org/docs/9.4/static/app-pgbasebackup.h... is half of what you want, but I expect you're wanting something I don't expect.
Table clustering is mostly betting on "I think most access is going to happen using this columns" which might be good choice or it might not, but it's not something that should happen implicitly or by the only choice at all.
I think Markus (2) explains this much better than I do, so I'll just link his text.
(1): which is pretty stupid name, btw - Oracle call it "index organized tables" which is much clearer way to describe the concept.
(2): http://use-the-index-luke.com/sql/clustering/index-organized...
Like materialized views, automatic update support for simple views was introduced in 9.3.
Well, you could just leave everything as a heap in MsSQL, someone might throw things at you, but you could do it.
Or you can pad the index to allow for inserts (or do some partitioning), and schedule some rebuilds to augment the fragmentation.
If you can expense the cost of the license you should definitely check it out.
Plus it's completely cross-platform and runs on Linux, Windows and OS X.
[1]: http://www.dbvis.com
Which sometimes made me wonder if, with Postgres becoming more popular by the day, there would be a market for a management tool cheaper than Navicat.
CSV support - if you do that much CSV extract/transform/load (or indeed, any kind of ETL work), use an ETL tool. SQL Server comes with SQL Server Integration Services for that kind of thing.
Ergonomics of dropping and creating tables and stored procedures - the author's example is probably the toughest way I can think of to drop a table. It's easier to check sys.all_objects (which will catch anything - functions, views, procs, etc).
Operating system choice - well, in 2015, if you're going to mention that, you should be thinking cloud services anyway. They're both available in the cloud - and at which point, who cares what OS it runs on?
Goes on and on. I'm a SQL Server guy, and if I was going to make a list of how to choose a relational database platform, here's what I'd probably list:
* Up-front cost (license, maintenance)
* Ease of finding staff, and their costs
* Ease of finding tooling, and its cost
* Ease of finding scalable patterns (proven ways to grow it)
I don't think either platform is a loser on those counts - it's hard to go wrong with either one.
Obviously Windows would be IIS, which means overall the OS choice is a nice one. The fact that you can choose which one to host with, which means you get something comfortable.
Have you seen Windows Core and PowerShell? An increasing number of admins are doing just that - it's how we work at Stack Overflow, for example.
Why? CSVs are portable and much easier to deal with (when exported correctly, which PostgreSQL does and MS SQL does NOT).
They were usually processed using Oracle external tables though.
I'm so tired of this. Just because you don't already know Powershell and are too lazy to learn doesn't mean it doesn't exist. A know-nothing Windows user might as well say, "If I want to select and move arbitrary files on Windows, I can point and control-click in seconds. How would you do that in the Linux CLI?" In both instances, it comes across as ignorant to anyone who actually knows the ecosystem being derided.
It's been a while, but I am pretty sure all you have to do is put GO before/after the CREATE PROCEDURE. I'm absolutely positive there's some way around it, because I've run many, many such scripts on SQL Server without manual intervention.
EDIT: Yes, I just fired up a VM and ran this and got the expected results with no errors.
CREATE DATABASE HackerNews;
CREATE TABLE dbo.Test (id int IDENTITY(1, 1), name varchar(20));
GO
INSERT INTO dbo.Test (name) VALUES ('Amezarak');
GO
CREATE PROCEDURE dbo.sp_QueryTest AS SELECT * FROM dbo.Test;
GO
EXEC dbo.sp_QueryTest
In my personal opinion, MSSQL (including the tooling around it) is awesome and possibly one of Microsoft's best products. I actually regret not getting to use it anymore since a) my current job doesn't use it and b) I'm not shelling out for a license for my side projects. Postgres is definitely my next pick, though, and both are miles ahead of MySQL. I understand that MySQL is "good enough" for most people, but it's always painful going back to it and inevitably remembering almost all my favorite features don't exist. I'm stuck with it on a side project and it's frustrating.
There's some other things like that in the post, some coming down to what seems like syntactic sugar and feels more like a difference that doesn't really need to be noted (like the types/dropping piece).
I work at a MSSQL shop, and all of us know and are convinced that PG is better. Most of us use PG for our side projects and some of the dev's don't even use windows that much, with some custom MSSQL plugins we've built for linux. However, the problem still exists, of how do you port a ton of Databases over to Postgres? We're a multi-tenancy shop, so close to zero downtime is very important, and it would get really complicated if we ran multiple production versions of our app, one with a PG adapter and one with a MSSQL adapter.
A cursory Google search will show that you aren't going to get a ton of help converting them[0][1], not to mention the overhead of switching 20 developers from MSSQL to PG overnight.
This website is however excellent at convincing people to use PG over MSSQL. Perhaps, given the direction that Microsoft is going, they'll open source MSSQL overnight and it will become something competitively similar to PG in the long run.
[0] http://www.convert-in.com/mss2pgs.htm
[1] https://wiki.postgresql.org/wiki/Microsoft_SQL_Server_to_Pos...
If you're already a licensed MSSQL customer, I'm not sure what advantages PostgreSQL could really have compared to it's slower performance and much higher operational costs.
PostgreSQL has better performance for some loads (as far as I know there is no clear winner in performance), better GIS support, writable CTEs, excellent built-in JSON and array support, an amazing CLI, and personally I think PostgreSQL has nicer built-in types and functions.
TempDB, at least last I looked, doesn't really scale too well.
Just an observation.
PostgreSQL supports the RETURNING clause, allowing UPDATE, INSERT and DELETE statements to return values from affected rows. This is elegant and useful. MS SQL Server has the OUTPUT clause, which requires a separate table variable definition to function. This is clunky and inconvenient and forces a programmer to create and maintain unnecessary boilerplate code.
So I have the equivalent of the following in one of my projects:
UPDATE sometable SET someField = @parameter
OUTPUT Inserted.field1, Inserted.field2
WHERE ...
Now, either I don't get the limitation the author describes or SQL Server can do that - returning information from the affected results. Works with DELETE as well. We can argue that 'inserted' is a crappy name here, but..Between the overloading of "WITH" to seemingly every new feature they add, to the way semicolon terminators are only sometimes required (so I just automatically start all CTE declarations as ";WITH" to be sure it works), to the ways they break encapsulation by incorrectly scoping cursor names and limiting INSERT...EXEC so that it can't be nested, programming it at an advanced level is an acquired taste at best.
1) I asked about a specific thing in the article. I think the author is wrong.
2) If 1 holds true I think that the article might be questionable - at least I don't trust the rest now. I am no expert on All Things SQL Server and if I spot a flawed point as a random dev in something that was supposedly written by someone working with databases for a decade [1], maybe people that actually know a lot more spot .. more flaws.
1: I'm a dev for longer than that and certainly know my way around sql and databases, but focus matters. The author claims "I know a fair bit about these things" and considers databases his main area of expertise it seems.
But I just tested what you claimed here and it definitely works exactly the way the RETURNING clause would.
Weird.
Don't get me wrong, I have 'political' problems with MSSQL, but those shouldn't be disguised as technical ones.
Well, not anymore. Now I just refuse to do shit. "We can't change it" turns into "we contacted the original developer and made him fix his broken shit" when I turn into a complete pain in the ass.
And I don't need a certification to understand the crazy MS licensing (I discovered recently that yes there is a MS certification program for their licenses as it's so byzantine).
For what it's worth, Atwood chose Postgres for his Discourse product. The FAQ says the product is "Uncompromisingly open source." And "There is only one version of Discourse--the awesome version. There's no super secret special paid commercial version with better or more complete features."
Actually, if you drop the constraints first, you don't have to worry about the table-drop order. I do that on both Postgres and SQL Server, because I use a tool that generates the specific scripts for me.
I don't think manually writing DDL scripts is any way to manage an RDBMS in the modern era, especially if you care about "repeatability and auditability" like he claims.
Queries, yes, but schema constructs, no. SQL is just a terrible language for it. I'm not saying you have to use a full ORM, but every SQL engine I've ever encountered will let you do a lot of nutty things and won't complain about it, or won't complain until you actually try to query it.
OTOH, when not on a Microsoft stack, it has been a constant source of hard to track down problems and bugs. I remember trying to use Microsoft's JDBC driver a few years ago in a Java web app and running into all kinds of nasty and unbelievable bugs in the actual driver they shipped.
>This is an advantage for MS SQL Server whenever you're running a query which is CPU-bound and not IO-bound. In real-life data analytics this happens approximately once every three blue moons. On those very rare, very specific occasions when CPU power is truly the bottleneck, you almost certainly should be using something other than an RDBMS. RDBMSes are not for number crunching.
As a data analyst, the tools to be comparing shouldn't be RDBMSs.
>As I said in the banner and the intro, I am comparing these databases from the point of view of a data analyst, because I'm a data analyst and I use them for data analysis. I know about SSRS, SSAS, in-memory column stores and so on, but I haven't mentioned them because I don't use them (or equivalent features). Yes, this means this is not a comprehensive comparison of the two databases, and I never said it would be. It also means that if you care mostly about OLTP or data warehousing, you might not find this document very helpful.
As for this part, data warehousing, OLAP services, and reporting services (lower case on purpose here) are a very large sub-domain within data analytics. I am not saying that these are everything in analytics, but especially from an enterprise standpoint, these make up the bulk of it. From a tooling and full-stack standpoint, Microsoft is quite strong in this segment.
I hate how MSSQL has gone back on their word to let customers benefit from CPU enhancements. They mocked Oracle for charging by type and core... And now they do the same.
Also, multiple result sets was a sorely missed feature when porting stuff to PG. But record types made up for it.
Of course now, the dominating factor for a lot of people is "Will a hosting provider (Azure, AWS, Google) just run this for me, automatically giving me perfect backups and restore and HA?" SQL Azure, as I understand, not only does backups, but allows you to restore to arbitrary points in time. Sure it's just keeping txlogs, but that sounds hot when sold like that. For many cases, I can see ditching the privacy issues of "cloud" to get those features with zero capex or management overhead.
It does, but they take about hour to hour. Restore takes about that much time. Changing performance level of DB takes similar time.
I'm not sure hot quick Amazon or Google is, but I know lot of ops guys who are sorely disappointed by Azure slowness.
I tried Google Compute Engine on a whim and I'm totally in love (despite a deep distrust of Google). Everything is fast. VMs load in seconds. And it's simple - no inane UI, no crazy leftover bits from being a PaaS. No idiotic design for SSD. And as a kicker, it's half the price for compute. Oh, and SSH client right there in the portal? It's such a small thing but really made me happy.
It's just that Google doesn't do startup outreach and give us cash and court us. Unless you're in an "established" incubator or yc or something. Whereas MS is super friendly and does everything they can.
This is pretty ridiculous and quite a bit insulting to the many people who do work for vendors. I work in a team that has a number of engineers supplied by vendors and they are generally fantastic. Highly qualified, more than happy to assist with tasks that aren't to do with their product and they really care about the overall project outcomes.
Open source has forced vendors to make sure that every project that uses their products are a success.
> On the other hand, commercial software is often designed by committee, written in cube farms and developed without proper guidance or inspiration
Again more nonsense. Not every open source project is some beacon of perfection and neither is every commercial product some poorly designed piece of junk. Anyone that believes otherwise is just being disingenuous.
Someone really needs to explain to me why PostgreSQL users in particular seem to always want to bash the competition in order to justify their technology choice. It's been going on for years against MySQL/Oracle first, then MongoDB/NoSQL and now SQL Server. It's odd.
"Open source products have support from people who've been told they need to 'contribute' to open source, and since they can't code, they just try to look at code and answer questions on a mailing list until they get an interview at Facebook".
I think it's less about PostgreSQL, and more about the particular products:
1. Oracle has always been a piece of crap from a technology standpoint. There are good reasons businesses use it, but they don't have to do with a robust, core product.
2. Early versions of MySQL were a joke. It's pretty good now, but for a long time, it was the butt of many jokes for good reason. It was very easy-to-use, and fast if you didn't need data consistency, but it didn't quite work right.
3. Stonebraker aside, I haven't heard much bashing of NoSQL. It's used in many places where it's the wrong tool for the job (JOINs are useful), but it's great where it is the right fit. It's just that MongoDB, in particular, isn't great if you have data integrity or performance requirements you care about.
4. I don't even know where to start on SQL Server. Seriously. The only reason to use that dog is if you're tied to a Microsoft-only shop.
There are lots of great technologies out there -- Cassendra, Google's Bigtable, memcached, modern versions MySQL, etc. -- which I've never heard people mocking. Conversely, users of most of those tend to make fun of the broken databases just as much as PostgreSQL users do. It's just that you see the PostgreSQL users doing more mocking simply because there are more of them out there.
Footnote: I'm developing on a MySQL+MongoDB stack right now. MySQL is great, but MongoDB is a bad joke.
(a) Not all vendors are the same, (b) Not all situations are the same, (c) Open source isn't exactly world renowned for porting back fixes to older releases.
Making broad generalisations is never helpful.
Very smug and condescending statement. Feels like an insecurity on the author's part.
IMO, completely unsuitable for web apps where DBA is not sitting around 24/7.
Is PostgreSQL better at this?
I think it's better to skip that and interface with the database using something more powerful - let the database handle getting the data that you want the best way possible, then process it using something better suited to the job.
MySQL doesn't do a great job of protecting your data compared to Postgres, MSSQL, Oracle, etc.
I can do that to Postgres all day and the on-disk data integrity is preserved.
Out of those things, postgres has a DBA tool out of the box. Yeah... nice comparison.
I find it particularly funny that he cites Dunning-Kruger. DK effect is not meant for others. It is meant for self-evaluation.