PostgreSQL 9.5: UPSERT, Row Level Security, and Big Data (opens in new tab)

(postgresql.org)

660 pointsahochhaus10y ago165 comments

165 comments

Whenever there is a version update, I can't help but be grateful for the documentation ethic of Postgres. For the vast majority of my projects I have to wade through unaffiliated and incomplete blog tutorials that may or may not be relevant to the version I'm trying to use. With anything related to Postgres, I may read about something on a blog post, but I always know that I can count on the Postgres documentation if I need supplemental information, or sometimes I'll skip the post and go straight to the official docs. The PostgreSQL project, in my mind, sets the standard globally for software documentation.

I should add that Postgres was the first database I ever used, and I literally learned pretty much everything I know about Postgres, SQL, as well as Relational and Set Logic from the official docs. And that was with no background in software development and an undergraduate business degree with Excel being my most technologically advanced toolset. That is a documentation success story.

jrapdx310y ago

Remarkable similar to my own history with Postgresql, which I started using in ~1998 at the time of their first public release. Postgresql documentation has indeed been the exemplar for all software, open source or not. It's been the SQL textbook I've relied on.

With the steady addition of features, it's gotten much more complex, and there will come a time when using just the documentation won't be enough to learn how to use Postgresql to full advantage. With release of 9.5 we might be there now.

Perhaps the logical extension of the documentation is some form of coursework to enable users to learn the DB systematically. I haven't looked into it, this might already be offered.

AlisdairO10y ago

(self plug) for coursework on the read-only SQL side, you might want to give http://pgexercises.com a try. I have ambitions to expand it to include arrays and json, but alas haven't found the time so far...

1 more reply

ptman10y ago

I call that BSD culture

cpursley10y ago

Speaking of great documentation, I've been blown away by the quality of documentation for sequel - a Ruby ORM that has some more advanced features (than ActiveRecord) for working with Postgres:

- http://sequel.jeremyevans.net/documentation.html

- http://sequel.jeremyevans.net/rdoc/files/doc/postgresql_rdoc...

- http://github.com/jeremyevans/sequel

sedatk10y ago

an interesting point about pgsql is that the team wrote the documentation first and coded it after. when it's finished they didn't need to write docs.

threeseed10y ago

> PostgreSQL project, in my mind, sets the standard globally for software documentation.

Sure if you already understand PostgreSQL. If you don't then it isn't particularly user friendly. It's awkward to navigate, doesn't explain the basics (how to actually install it) and tends to combine information that is more for advanced users on the same page as beginners.

Compare you're standard with mine (MongoDB):

http://www.postgresql.org/docs/9.5/static/

https://docs.mongodb.org/manual/

simoncion10y ago

> ...doesn't explain the basics (how to actually install it)...

Er, http://www.postgresql.org/download/ and/or http://www.postgresql.org/docs/current/static/installation.h... ?

> ...tends to combine information that is more for advanced users on the same page as beginners.

There's a tutorial for beginners: http://www.postgresql.org/docs/9.4/static/tutorial.html

Then there's reference material in the rest of the manual. The tutorial even suggests the order in which you should read the rest of the manual. :)

1 more reply

lugus3510y ago

With a REST layer like https://github.com/begriffs/postgrest you don't need any extra application server layer to serve your data, securely (RLS). Bye bye Java EE ?

reitanqild10y ago

You just got an accidental downvote as I came back after bookmarking your link. Sorry.

Dang/mods: Can we please please get something to avoid accidentally downvoting good stuff on mobile phones?

danieltillett10y ago

Happens all the time and has been the most requested feature of all time. I like to take the charitable view and say the current design is there to let your ego explain away down votes - i.e. I am only being down voted by accident, not my post is terrible.

mdellabitta10y ago

That feature is included with every new account!

saidajigumi10y ago

It's especially bad on mobile, but desktop can be a problem too. I know I've misclicked a downvote on rare occasion, to my regret.

spacemanmatt10y ago

I gotta plug OpenRESTy at this point. It's based on Nginx, and it's really easy to work with. It is my go-to REST solution for PostgreSQL.

rodionos10y ago

It's not going to be as easy as you make it sound - getting rid of the application tier. Think of all the goodness that the Spring framework provides, for example. Do you anticipate all the modules will be taken over by a database?

scardine10y ago

Postgrest has a very clever design. I want to see other tools going this route.

saosebastiao10y ago

It would be kinda cool to find a way to stuff a fast HTTP server into Postgres and run it all directly from the Postgres process serving the request.

3 more replies

bdcravens10y ago

I looked at Postgrest, and it was a little more opinionated re URLs and associations than I would have preferred.

brunoqc10y ago

I'm not that familiar with REST. Care to explain what you mean?

1 more reply

tracker110y ago

Really nice to see the direction things are moving in... I do feel that the replication/failover story needs a lot of work, but there's been some progress towards getting it in the box. Even digging into any kind of HA strategy is cumbersome to say the least (short of a 5-6 figure support contract). It's one of the things that generally stops me from considering PostgreSQL for a lot of projects. As a side note, I really like how RethinkDB's administrative interface is and their failover usage. It would be great to see something similar reach an integration point for PostgreSQL.

I also think that PLv8 should probably make it in the box in the next release or two. With the addition of JSON and Binary JSON data options, having a procedural interface that leverages JS in the box would be a huge win IMHO. Though I know some would be adamantly opposed to this idea.

roeme10y ago

> I do feel that the replication/failover story needs a lot of work, but there's been some progress towards getting it in the box. Even digging into any kind of HA strategy is cumbersome to say the least (short of a 5-6 figure support contract)

Eeeh...a simple HA solution can be developed in about a week (I was able to do so on 9.3, and so far, it held it's ground). Also, now with 9.5's pg_rewind you can easily switch back and forth between nodes (http://www.postgresql.org/docs/9.5/static/app-pgrewind.html), simplifying things a great deal. Can't imagine that's 5-6 figures.

I agree that you don't get a Plug&Play-Solution out of the box, but from anecdotal evidence they often don't quite work as advertised anyway (remember 1995? And I'm sure your friendly DBA has some stories to share as well).

yashap10y ago

> Even digging into any kind of HA strategy is cumbersome to say the least (short of a 5-6 figure support contract). It's one of the things that generally stops me from considering PostgreSQL for a lot of projects.

If you're going to be hosting your db on something like AWS EC2 anyways, then just buy a db product like AWS RDS, and pay for the HA option. Ends up around the same price as if you'd set up everything yourself (assuming you were going to host on AWS anyways, and not going with a low cost option), and is very easy.

elchief10y ago

Ya I'd love to see PLV8 (with as many ES6 features as possible) as a stock language

jeltz10y ago

There are plenty of nice minor improvements in the release notes. One of my favorites is "Allow array_agg() and ARRAY() to take arrays as inputs (Ali Akbar, Tom Lane)", it will come in handy when writing ad hoc queries to understand stored data. Right now I have to build a string which I use as input to array_agg().

Twisell10y ago

I'm so pleased to see that I was not the only freak out here doing that!

spamizbad10y ago

I'm very excited about CUBE and ROLLUP -- I am just about to start a project that barely requires an OLAP shim. Now it looks like I can just do it all with just database features. Yay for fewer dependencies!

paulsmith10y ago

I'm not familiar with those statements, can you provide an example?

jsmeaton10y ago

They're useful for providing summaries like row totals and column totals. Rather than just get an aggregated count for the total GROUP, you can also get aggregated counts for each unique combination of columns within the GROUP.

  col1 col2 count  
  ---- ---- -----  
  a    b    10  
  a    null 5  
  null b    5

There's more to it than that obviously, but you can read about them here: http://www.postgresql.org/docs/devel/static/queries-table-ex... (7.2.4. GROUPING SETS, CUBE, and ROLLUP)

tommoor10y ago

I might get some hate, but I also think upsert was one of the best features that MongoDB offered that PG didn't, so this is a big win from that perspective too.

bpicolo10y ago

Which makes sense, because the atomicity of the upsert is really the tricky part.

JohnBooty10y ago

Not at all. UPSERT is the feature everybody's excited about!

elchief10y ago

Regarding Row Security, yes you can use it with a web application and still use connection pooling.

From web server, connect to db as one user then SET ROLE to the database user. This gives you Column Security and easier auditing as well. See http://stackoverflow.com/questions/2998597/switch-role-after...

jimktrains210y ago

The thing is that each application user now needs a corresponding DB user to use RLS. While this isn't a huge problem, it's different than how most (if not 99%?) of applications work.

anarazel10y ago

No, RLS does not necessarily require separate database users. Using database users is one relatively obvious way to use the feature, but you can very well do something like 'SELECT myapp_set_current_user(...)' or something, and use a variable securely set therein for the row restrictions.

2 more replies

nickpeterson10y ago

It's probably unsuitable for public facing sites, but for line of business applications it could be a real win. I would much rather have enforcement down to the data level.

1 more reply

elchief10y ago

Yes, but it's not harder to have many db users vs many app users. There's even an extension to sync pg users with ldap

1 more reply

ropiku10y ago

Great that Heroku sponsored upsert and have support for 9.5 right now (in beta): https://blog.heroku.com/archives/2016/1/7/postgres-95-now-av...?

overcast10y ago

Interesting, I wasn't aware PostgreSQL didn't have UPSERT until now. MySQL has INSERT on DUPLICATE functionality that is similar.

avidal10y ago

Yep. Been a long requested feature. One of the reasons why the post states: "This feature also removes the last significant barrier to migrating legacy MySQL applications to PostgreSQL."

overcast10y ago

Roger that.

avar10y ago

What do they mean by "legacy" in this context? INSERT ... ODKU is not a legacy feature of MySQL, it's a currently supported first-class feature of the database, nor is MySQL itself a "legacy" database.

3 more replies

desdiv10y ago

Ironically the first two google results for "UPSERT" is from wiki.postgresql.org.

overcast10y ago

Shows how long it's been in the works, and requested.

jayess10y ago

Can anyone suggest a good "getting started" tutorial for PostgreSQL/debian/php? I've been using Mysql for years and would like to give Postgres a try.

hardwaresofton10y ago

http://www.postgresql.org/docs/9.5/static/tutorial-start.htm...

What particularly are you looking for in a "getting started" tutorial? Honestly, you should just plunge in, on some side project (or a mirror of whatever projects you've used MySQL on) and just compare.

This is a lot easier to say than do/live by, but I think you shouldn't invest in one tool choice when you haven't given the others a fair shake (once you have enough time to step back and think about your decision).

olefoo10y ago

Install it, and then build something.

A few things you will want to look at that are different:

1. data types are much richer and more useful than in mysql

2. transactional DDL means migrations are atomic.

3. schemas are what mysql refers to as databases. Remember to set `search_path`.

4. roles and grants are somewhat more expressive and work differently than in mysql, but not that differently for the simpler use cases

5. database functions ( aka stored procedures ) are awesome as are extension languages.

tracker110y ago

On point 5... love PLv8, which imho makes working with the newer JSON data types really nice.

btilly10y ago

Yay! I like the changes.

But they are doing absolutely nothing about my biggest beef with PostgreSQL. Which is that there is absolutely no way to lock in good query plans. It always reserves the right to switch plans on you, and sometimes gives much, much, much worse ones. No other database does this to me. Even MySQL's stupid optimizer can be reliably channeled into specific query plans with the right use of temporary tables and indexes.

This is a problem because improvements don't matter if the query plan is "good enough". But they will care if you screw up. PostgreSQL usually does well, but sometimes screws up spectacularly.

The example that I have been struggling the most often with in the last few months is a logging table that I create summaries from. Normally I only query minutes to hours, but I set it up as a series of SQL statements so I first put the range in a table, and then have happened BETWEEN range_start AND range_end. PostgreSQL really, Really, REALLY wants to decide that the index on the timestamp is a bad idea, and wants to instead do a full table scan. Every time it does, summarization goes from under a second to taking hours.

Hopefully the new BRIN indexes will be understood by the optimizer in a way that makes it happier to use the index. But I'm not optimistic. And if I lean on it harder, I'm sure from past experience that I'll find something else that breaks.

ProblemFactory10y ago

There is some discussion on why the Postgres team dislikes query hints here: https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion

But perhaps I also have some practical advice to try.

I had a similar issue: I have a few tables with sensor data, 300-500 million rows, indexed among other things by event type. Some counting queries kept defaulting to full table scans. It turned out that this was because of limited statistics on the distribution of counts by event type.

The default_statistics_target config parameter sets how many entries Postgres keeps in the histogram of possible values per column, the default is 100 I think. Because my event types were not evenly distributed, the less frequent ones were missing from the statistics histogram altogether, and somehow this resulted in bad query plans.

As a fix, I upped the default_statistics_target to 1000, and to set it to 5000 for the biggest tables. Then after a vacuum analyze, the query planner started making sensible choices.

Another thing to try is perhaps reducing the random_page_cost config parameter from it's default of 4.0. On SSDs, random page costs are much closer to 1 than they are 4 (compared to long sequential reads).

btilly10y ago

This is all good optimization advice, but I don't think it is applicable to my specific case.

My problem is not that PostgreSQL does not understand the distribution of my data. It does. The problem is that it comes up with a query plan without realizing that I'm only querying for a very small range of timestamps.

If this happens again, I'll have to try rewriting code to send it queries with hard-coded timestamps, cross fingers and pray. I find prayer quite essential with PostgreSQL sometimes because as ineffective as it is, at times I've got nothing else.

2 more replies

anarazel10y ago

I think this is a pretty important area to work on. Not in the direction of query hints, but rather have "approved" query plans, and an interface to see new query plans, and how much their costs differ.

That'd not only make production scenarios more reliable, but it'd also make it much easier to test tweaks to the cost model in practice.

The problem is that that's a rather significant project, and it's hard to get funding for that kind of work. The postgres companies aren't that big, and it's not an all that sexy feature marketing wise.

btilly10y ago

It may not look like a sexy marketing feature. But it is the #1 reason why I don't recommend PostgreSQL.

I also don't think that query hints are a good way to do it. And I don't mind if the way to do it is somewhat cumbersome. This is very much a case where 20% of the work can give 99% of the benefit.

For example what about the following approach?

1. Add an option to EXPLAIN that will cause PostgreSQL's optimizer to spit out multiple plans it considered, with costs, and with a description of the plan that PostgreSQL can easily parse and fit to a query.

2. Add a PLAN command that can be applied to a prepared statement and will set its plan. It is an error to submit a plan that does not match the query.

And now in the rare case where I don't like a query's plan I can:

    EXPLAIN PLANS=3 (query);

Pick my desired plan from the list (hopefully)

Then in my code I:

    PREPARE foo AS (query);
    PLAN foo (selected plan);
    EXECUTE foo;

And now if I notice that a query performs worse than I think it should, I can make it do what I want it to.

1 more reply

hibikir10y ago

This sounds like the classic Postgres problems on large, insert-only tables. The default settings for statistics gathering just aren't tuned for tables like this. Now, normally the problem is not full table scans, but using extremely inefficient join strategies, but chances are it's the same problem.

The typical solution is to modify the autovacuum settings for that table to recalculate the statistics a lot more often, and maybe even with much higher resolution, depending on your case.

You can also convince it that indices are the way to go by changing more basic settings about costs of reading a random page on disk vs reading sequentially, making full table scans more expensive, but tuning those settings away from realistic costs might have negative side effects for you.

I was able to have great success running complex queries on 100M+ row tables that were insert-only using this kind of trick, but YMMV. If nothing else fails, really experienced people are more than willing to help in the performance mailing list. They sure helped me quite a few times.

btilly10y ago

The first thing that I tried was a full VACUUM ANALYZE and then re-ran the same query. It didn't help. Therefore modifying autovacuum won't help.

Adjusting internal costs is promising, but I'd like to avoid going there exactly because of the possible negative side effects that you mention.

1 more reply

keslerm10y ago

You can push it in favor of certain options by disabling the one you don't want, such as sequential scan.

SET enable_seqscan = OFF;

We use these options a lot on tables that result in odd query plans to get them doing the best option..

btilly10y ago

That's a random sledgehammer, but that is how I have been solving the problem. I've set enable_seqscan, enable_nestloop and enable_material to false and it is working at the moment. At first I only turned off enable_seqscan, but then I turned off the other two after we switched database hosts and the query went belly up.

What scares me is that this is unreliable, and according to the documentation, the optimizer is free to choose to ignore everything that I say whenever it wants. The fact that it already HAS done that to me does not provide me comfort.

1 more reply

reactor10y ago

Congrats to everyone involved, indeed an amazing opensource project that displays true integrity and discipline.

petergeoghegan10y ago

Thanks

wingsonfire10y ago

I am looking forward for Row Level Security, Though cell level security will be even more awesome to have.. Difficult to achieve in SQL. Only Apache Accumulo in NoSQL space has it.. But once we have it make sure no one has access to SSN column and we will be protected to one degree in data breaches.

noselasd10y ago

Wouldn't you be able to do this in Postgres now ?

"RLS implements true per-row and per-column data access control"

uberneo10y ago

Any practical use cases of Row level security ?

Sanddancer10y ago

Imagine you're working in a place with a rather large sales/marketing department. Now, sales is a pretty cutthroat job, and some people will do any cheat possible to get ahead of their co-workers. In a typical sales database, someone who manages to borrow a bit of sql from a techie friend could potentially go in and get sales leads from co-workers, poaching their deals. With row level security, the sleezier co-worker isn't allowed to look at those rows to be able to poach.

1 more reply

bryanlarsen10y ago

https://github.com/begriffs/postgrest

1 more reply

brlewis10y ago

What can you do with row-level security that you can't do by setting permissions on an updatable view?

jeffdavis10y ago

Normal views aren't designed for security. There are a number of ways that they can "leak" information that is supposed to be hidden.

The reason is that the optimizer reorders operations. So, a tricky person can write the query in a way that, for example, throws a divide-by-zero error if someone's account balance is within a certain range, even if they don't have permission to see the balance. Then they can run a few queries to determine the exact balance.

RLS builds on top of something called a "security barrier view" which prevents certain kinds of optimizations that could cause this problem.

It also offers a nicer interface that's easier to manage.

1 more reply

elchief10y ago

For one, you don't need a view. And you'd probably need (in 9.4) insert, update, and delete triggers for anything beyond trivial row security.

ddlatham10y ago

HBase has also had cell level security for a couple of years now.

https://blogs.apache.org/hbase/entry/hbase_cell_security

brunoqc10y ago

Why would you have a SSN column if no one has access to it?

wingsonfire10y ago

I meant no human user has access to it. But application user that connects to credit score engine, map reduce job, or something similar has access.

I know applications can be compromised but now you can freely share your DB freely with other teams to analyze or play around

ankimal10y ago

http://www.postgresql.org/docs/9.5/static/brin-intro.html If like me you were looking for what BRIN index is all about.

avita110y ago

Out of curiosity, has anyone managed to find something more detailed about the guts?

It sounds like it's basically a BTree that stops branching at a certain threshold, but I'm almost certainly wrong.

anarazel10y ago

No, that's not really it, although you could see it as a very degenerate form of a btree. Basically it's using clustering inherent to the data - say a mostly increasing timestamp, autoincrement id, model number ... - to build a coarse map of the contents. E.g. saying "pages from 0 to 16 have the date range 2011-11 to 2011-12" and "pages from 16 to 48 have the date range 2012-01-01 to 2012-01-13". With such range maps (where obviously several overlapping ranges can exist) you can build a small index over large amounts of data.

Obviously single row accesses in a fully cached workload are going to be faster if done via a btree rather than such range maps, even if there's perfect clustering. But the price for having such an index is much lower, allowing you to have many more indexes. Additionally it can even be more efficient to access via BRIN if you access more than one row, due to fewer pages needing to be touched.

jakobegger10y ago

There was a talk about index internals at pgconf.eu that also covered the new BRIN indexes.

Slides are here: http://hlinnaka.iki.fi/presentations/Index-internals-Vienna2...

sandGorgon10y ago

anybody know how quickly does RDS upgrade to newer versions of Postgres.

I'm really, really keen to use 9.5 jsonb with its insert/update changes.

andor43610y ago

While you wait, check out http://stackoverflow.com/a/23500670/229006

My plan is to rely on these functions for now, and switch to the native implementations once 9.5 is production ready on RDS.

sandGorgon10y ago

this is so cool... thanks!!

fuhrysteve10y ago

They have a policy not to say anything or make any promises. That said, using history as a guide, it seems to take them about 2.5 months after a major release to add support.

dandigangi10y ago

I swear... I will redesign and develop Postgre's site for free.

spacemanmatt10y ago

I'm curious, what do you think needs to be redesigned?

dandigangi10y ago

I wish I could say it's circa Web 2.0 but it's still stuck even farther back than that. I mean it works which is great but I loathe spending time on it because it's such a poor experience.

jimktrains210y ago

Send a message to pgsql-www

systems10y ago

how does postgresql upsert compare to ms sql's merge statement

i want to look deeper into this, but didnt have the time but from the little i read, seems ms sql merge is more powerful

jeltz10y ago

The new PostgreSQL syntax is more convenient to use in the UPSERT use case while the MERGE syntax is more convenient to use when doing complicated operations on many rows of data (for example when merging one table into another, with a non-tricial merge logic).

The reason PostgreSQL went with this syntax is that the goal was to create a good UPSERT and getting the concurrency considerations right with MERGE is hard (I am not sure of the current status, but when MERGE was new in MS SQL it was unusable for UPSERT) and even when you have done that it would still be cumbersome to use for UPSERT.

EDIT: The huge difference is that PostgreSQL's UPSERT always requires a unique constraint (or PK) to work, while MERGE does not. PostgreSQL relies on the unique constraint to implement the UPSERT logic.

dsp123410y ago

I am not sure of the current status, but when MERGE was new in MS SQL it was unusable for UPSERT

I've used MERGE as an UPSERT using MATCHED/NOT MATCHED and SERIALIZABLE/HOLDLOCK since it was introduced in mssql 2008. It was one of the first features I upgraded my code to use, and it worked out of the box with no issues.

1 more reply

manigandham10y ago

Lots of databases have MERGE but it's different from the typical UPDATE OR INSERT logic in terms of use cases, table requirements and concurrency control.

Here's a great post from Postgres team showing why they didn't just implement merge themselves:

http://www.postgresql.org/message-id/CAM3SWZRP0c3g6+aJ=YYDGY...

cjauvin10y ago

I wonder how much time it will take to appear in the Ubuntu apt repo? Should it be already there (I don't see it yet)?

Edit: I meant apt.postgresql.org of course, not the official Ubuntu repo..

jeltz10y ago

No idea, but the PostgreSQL community distributes official Debian and Ubuntu packages at apt.postgresql.org. They should already have 9.5, or if not have it very soon.

chbrown10y ago

Every time there's a minor version update I have to remind myself the sequence of upgrade incantations. It's pretty simple, but here's a gist that might help anyone upgrading from 9.4 to 9.5 with Homebrew:

https://gist.github.com/chbrown/647a54dc3e1c2e8c7395

jhealy10y ago

pglogical (http://2ndquadrant.com/en-us/resources/pglogical/) claims to allow cross version upgrades from 9.4 to 9.5 with minimal downtime, but the documentation seems fairly light-on.

Has anyone come across a guide to using it for upgrades?

desmondrd10y ago

Upsert is something I expect so commonly in modern databases nowadays. Happy to see it here with Postgres.

tmaly10y ago

this is excellent news. I really enjoy using postgresql in one of my current projects. I look forward to using upsert and the new indexes

omarforgotpwd10y ago

What an absolutely fantastic project. The recent releases have all been very exciting.

gionn10y ago

Bye bye mongodb.

gcb010y ago

search in 5% the time it would take to search a btree? anyone can see that with actual data?

gcb010y ago

http://pythonsweetness.tumblr.com/post/119568339102/block-ra...

the very first example points to BRIN indexes resulting in smaller index than btree but with much longer search time... so i guess the 5% time figure was very use-case specific?

mmaunder10y ago

Unless I'm mistaken MySQL has had this for almost a decade with "ON DUPLICATE KEY UPDATE". I'm seeing a lot more about PSQL here and in the news. I've always found it to be unfriendly and slow. Why the new attention? Is there really something about PSQL that makes it better than MySQL these days? It used to be transactions, but InnoDB made that moot years ago.

We do over 20,000 queries per second on one of our production mysql DB's and I'm not sure I'd trust anything else with that: http://i.imgur.com/sLZzXhS.png

Just curious if I'm missing out on some new awesomeness that PostgreSQL has or if it's just marketing.

sigil10y ago

> Is there really something about PSQL that makes it better than MySQL these days?

In a word: correctness.

Yes, MySQL has an UPSERT implementation. Like so many things MySQL rushed out the door, it's also buggy and unpredictable. Did you know UPSERTing into a MySQL table with multiple unique indexes can result in duplicate records? Did you know MySQL's ON CONFLICT IGNORE will insert records that violate other not-NULL constraints? [1]

I've used both MySQL and PostgreSQL for over a decade, and working around the many MANY misbehaviors and surprises in MySQL requires continuous dev effort. PostgreSQL on the other hand is correct, unsurprising, and just as performant these days.

MySQL is what happens when you build a database out of pure WAT [2].

[1] https://wiki.postgresql.org/wiki/UPSERT#MySQL.27s_INSERT_......

[2] https://www.destroyallsoftware.com/talks/wat

mmaunder10y ago

Nah. MySQL rocks. I've been using it since 1998 at Credit Suisse, 2000 at eToys.com where we used it to run the entire company from warehouse to web. I used it at the BBC in 2003 for a high traffic Radio 1 application and I've used it since then on my own companies with serious volume including a job search engine featured in NYTimes and Time Mag in 2005 and feedjit.com doing real-time traffic on over 700,000 sites. We use it for Wordfence now which is where the image link comes from I posted earlier with 20K TPS. All very high traffic with consequences if it screws up. I've never run into any of the issues you mention.

You say "upserting into a mysql table". Which storage engine? MyISAM? InnoDB? I find MySQL to be both reliable and incredible durable i.e. it handles yanking the power cord quite well. The performance also scales up linearly for InnoDB even for very high traffic and concurrency applications.

We use redis, memcached and other storage engines - by no means are we tied to mysql. But for what it does, it does it incredibly well.

I'm also completely open to using PostgreSQL and I was hoping someone could give me a compelling reason to switch to it or to use it.

2 more replies

colanderman10y ago

If all you care about is QPS, by all means, stick with MySQL.

People like myself use Postgres because it has a much richer feature set. See http://stackoverflow.com/a/5023936/270610 for some examples. Personally I find MySQL beyond frustrating due to its lack of… well almost all of those. Recursive CTEs in particular, but arrays and rich indexing are pretty core too.

Postgres's query optimizer is far more advanced too. MySQL doesn't even optimize across views, which discourages good coding practices.

The documentation is fantastic. Complete and well-written, covers the nuances of every command, expression, and type. MySQL's doesn't hold a candle to it.

Don't know what you mean about "unfriendly". Help is built into the command-line tool, and like I said, the documentation is fantastic. Maybe MySQL is a little more "hand-holdy", but I don't care for such things so I wouldn't know.

lobster_johnson10y ago

It's a long time since MySQL was faster than Postgres.

Back in the early 2000s, LAMP people on Slashdot were benchmarking MyISAM tables to Postgres' 6.5/7.x's fully transactional engine. Unfortunately, the reputation as being slow stuck among developers.

Postgres particularly shines on multicore systems, thanks to some clever internal design choices. Having a sophisticated cost-based query planner also helps.

As for unfriendly: Care to amplify? In my work, I've found the opposite to be true.

For example, the very first thing you tend to encounter as a new developer is "how to create a user". For MySQL, it turns out that using GRANT to grant a permission creates a user, which is counterintuitive; GRANT also sets the password, and promotes the use of cleartext passwords. By comparison, Postgres has "createuser", as well as a full-featured set of ALTER USER commands. The difference between "mysql" and "mysqladmin" is also completely unclear.

The almost complete lack of warts and legacy cruft in Postgres significantly removes the possibility of confusion, uncertainty and information overload. MySQL's manuals are littered with "if X is enabled then this behavior is different, and in versions since 5.7.3.5 this behavior has been changed slightly, and 5.7.3.6 has a bug that silently swallows errors", etc. MySQL's historical date and NULL handling alone is worth a chapter of any book.

Postgres also has a level of strictness above MySQL, which is in itself instructive. You know when you're doing something wrong. Postgres never accepts bad input. It always requires a strictly correct configuration setup.

Plus: Just type \h in psql. It has a complete reference of the entire SQL syntax.

rodgerd10y ago

> LAMP people on Slashdot were benchmarking MyISAM tables to Postgres' 6.5/7.x's fully transactional engine.

It wasn't just the /. crowd. Back in the 3.5 days the MySQL devs were doing that too, and writing long discourses on why transaction safety was a bunch of crap and a crutch for bad application developers.

gnud10y ago

Off the top of my head:

  - CTE's
  - Arrays/JSON type
  - partial indexes
  - transactional DDL
  - NOTIFY
  - Materialized views
  - Schemas
  - PostGIS
  - Row level security (which is new in PG)

Veratyr10y ago

I'm no expert but in my (limited) experience, there are some super handy datatypes and features that PostgreSQL supports that MySQL doesn't:

- Arrays, particularly with GIN indexes. This makes things like tagging fantastic in Postgres. Instead of putting your tags in another table, you throw them in an array and you can do all kinds of things like set intersection-like queries.

- JSON. Postgres can store data as JSON and index and query the JSON. This essentially gives you MongoDB type queries.

I'm sure there's more but these are my favourite Postgres features.

davidw10y ago

Mysql always seemed to be fast like a bike going downhill with no brakes.

Postgres has always taken a more 'solid' approach. One instance that made my jaw drop when I realized it: in the past (has this been fixed?), DDL (alter table, create table, etc...) were not transactional in Mysql. You could get 50% through a series of them, and find your database 100% fucked up.

That said, over the years Mysql has been improving too, for sure.

saurik10y ago

MySQL still does not have transactional DDL.

1 more reply

__jal10y ago

Some of it is historic - Mysql has gotten much better in recent years at supporting the parts of being an RDBMS that matters the most when money is riding on it.

So honestly, at this point I do think some of it is impressions from the past which are no longer valid. But still, Mysqlhas done/does all sorts of things that defy the spec, convention or just common sense (I don't know if this has been fixed, but at least for many years, April 30th was treated as a valid date, and there was some profound weirdness of which I can't quite recall the details involving locale stuff).

Postgres generally takes the position that data should always be safe first and speedy sometime later. It also assumes the operator understands their tools. That second one means in comparison with Mysql, people think it is unfriendly. It isn't (if you want to see unfriendly, go work with Oracle), it just expects that its friends learn about it. Which is of course good advice when you're dealing with complicated software on which a lot of money tends to ride.

And as the PG devs have said for years, they don't compete with Mysql. They compete with Oracle. There's no reason to switch if you're happy with Mysql.

rimantas10y ago

What's wrong with April 30th? And operator understanding their tools applies to MySQL as well. You can set it be more strict, but default is lax. Not sure if reverse is possible with PG.

1 more reply

j / k navigate · click thread line to collapse

165 comments

saosebastiao10y ago

jrapdx310y ago

Perhaps the logical extension of the documentation is some form of coursework to enable users to learn the DB systematically. I haven't looked into it, this might already be offered.

AlisdairO10y ago

1 more reply

ptman10y ago

I call that BSD culture

cpursley10y ago

Speaking of great documentation, I've been blown away by the quality of documentation for sequel - a Ruby ORM that has some more advanced features (than ActiveRecord) for working with Postgres:

- http://sequel.jeremyevans.net/documentation.html

- http://sequel.jeremyevans.net/rdoc/files/doc/postgresql_rdoc...

- http://github.com/jeremyevans/sequel

sedatk10y ago

an interesting point about pgsql is that the team wrote the documentation first and coded it after. when it's finished they didn't need to write docs.

threeseed10y ago

> PostgreSQL project, in my mind, sets the standard globally for software documentation.

Compare you're standard with mine (MongoDB):

http://www.postgresql.org/docs/9.5/static/

https://docs.mongodb.org/manual/

simoncion10y ago

> ...doesn't explain the basics (how to actually install it)...

Er, http://www.postgresql.org/download/ and/or http://www.postgresql.org/docs/current/static/installation.h... ?

> ...tends to combine information that is more for advanced users on the same page as beginners.

There's a tutorial for beginners: http://www.postgresql.org/docs/9.4/static/tutorial.html

Then there's reference material in the rest of the manual. The tutorial even suggests the order in which you should read the rest of the manual. :)

1 more reply

lugus3510y ago

With a REST layer like https://github.com/begriffs/postgrest you don't need any extra application server layer to serve your data, securely (RLS). Bye bye Java EE ?

reitanqild10y ago

You just got an accidental downvote as I came back after bookmarking your link. Sorry.

Dang/mods: Can we please please get something to avoid accidentally downvoting good stuff on mobile phones?

danieltillett10y ago

mdellabitta10y ago

That feature is included with every new account!

saidajigumi10y ago

It's especially bad on mobile, but desktop can be a problem too. I know I've misclicked a downvote on rare occasion, to my regret.

spacemanmatt10y ago

I gotta plug OpenRESTy at this point. It's based on Nginx, and it's really easy to work with. It is my go-to REST solution for PostgreSQL.

rodionos10y ago

scardine10y ago

Postgrest has a very clever design. I want to see other tools going this route.

saosebastiao10y ago

It would be kinda cool to find a way to stuff a fast HTTP server into Postgres and run it all directly from the Postgres process serving the request.

3 more replies

bdcravens10y ago

I looked at Postgrest, and it was a little more opinionated re URLs and associations than I would have preferred.

brunoqc10y ago

I'm not that familiar with REST. Care to explain what you mean?

1 more reply

tracker110y ago

roeme10y ago

yashap10y ago

elchief10y ago

Ya I'd love to see PLV8 (with as many ES6 features as possible) as a stock language

jeltz10y ago

Twisell10y ago

I'm so pleased to see that I was not the only freak out here doing that!

spamizbad10y ago

paulsmith10y ago

I'm not familiar with those statements, can you provide an example?

jsmeaton10y ago

  col1 col2 count  
  ---- ---- -----  
  a    b    10  
  a    null 5  
  null b    5

There's more to it than that obviously, but you can read about them here: http://www.postgresql.org/docs/devel/static/queries-table-ex... (7.2.4. GROUPING SETS, CUBE, and ROLLUP)

tommoor10y ago

I might get some hate, but I also think upsert was one of the best features that MongoDB offered that PG didn't, so this is a big win from that perspective too.

bpicolo10y ago

Which makes sense, because the atomicity of the upsert is really the tricky part.

JohnBooty10y ago

Not at all. UPSERT is the feature everybody's excited about!

elchief10y ago

Regarding Row Security, yes you can use it with a web application and still use connection pooling.

jimktrains210y ago

The thing is that each application user now needs a corresponding DB user to use RLS. While this isn't a huge problem, it's different than how most (if not 99%?) of applications work.

anarazel10y ago

2 more replies

nickpeterson10y ago

It's probably unsuitable for public facing sites, but for line of business applications it could be a real win. I would much rather have enforcement down to the data level.

1 more reply

elchief10y ago

Yes, but it's not harder to have many db users vs many app users. There's even an extension to sync pg users with ldap

1 more reply

ropiku10y ago

Great that Heroku sponsored upsert and have support for 9.5 right now (in beta): https://blog.heroku.com/archives/2016/1/7/postgres-95-now-av...?

overcast10y ago

Interesting, I wasn't aware PostgreSQL didn't have UPSERT until now. MySQL has INSERT on DUPLICATE functionality that is similar.

avidal10y ago

Yep. Been a long requested feature. One of the reasons why the post states: "This feature also removes the last significant barrier to migrating legacy MySQL applications to PostgreSQL."

overcast10y ago

Roger that.

avar10y ago

3 more replies

desdiv10y ago

Ironically the first two google results for "UPSERT" is from wiki.postgresql.org.

overcast10y ago

Shows how long it's been in the works, and requested.

jayess10y ago

Can anyone suggest a good "getting started" tutorial for PostgreSQL/debian/php? I've been using Mysql for years and would like to give Postgres a try.

hardwaresofton10y ago

http://www.postgresql.org/docs/9.5/static/tutorial-start.htm...

olefoo10y ago

Install it, and then build something.

A few things you will want to look at that are different:

1. data types are much richer and more useful than in mysql

2. transactional DDL means migrations are atomic.

3. schemas are what mysql refers to as databases. Remember to set `search_path`.

4. roles and grants are somewhat more expressive and work differently than in mysql, but not that differently for the simpler use cases

5. database functions ( aka stored procedures ) are awesome as are extension languages.

tracker110y ago

On point 5... love PLv8, which imho makes working with the newer JSON data types really nice.

btilly10y ago

Yay! I like the changes.

This is a problem because improvements don't matter if the query plan is "good enough". But they will care if you screw up. PostgreSQL usually does well, but sometimes screws up spectacularly.

ProblemFactory10y ago

There is some discussion on why the Postgres team dislikes query hints here: https://wiki.postgresql.org/wiki/OptimizerHintsDiscussion

But perhaps I also have some practical advice to try.

As a fix, I upped the default_statistics_target to 1000, and to set it to 5000 for the biggest tables. Then after a vacuum analyze, the query planner started making sensible choices.

btilly10y ago

This is all good optimization advice, but I don't think it is applicable to my specific case.

2 more replies

anarazel10y ago

That'd not only make production scenarios more reliable, but it'd also make it much easier to test tweaks to the cost model in practice.

btilly10y ago

It may not look like a sexy marketing feature. But it is the #1 reason why I don't recommend PostgreSQL.

I also don't think that query hints are a good way to do it. And I don't mind if the way to do it is somewhat cumbersome. This is very much a case where 20% of the work can give 99% of the benefit.

For example what about the following approach?

2. Add a PLAN command that can be applied to a prepared statement and will set its plan. It is an error to submit a plan that does not match the query.

And now in the rare case where I don't like a query's plan I can:

    EXPLAIN PLANS=3 (query);

Pick my desired plan from the list (hopefully)

Then in my code I:

    PREPARE foo AS (query);
    PLAN foo (selected plan);
    EXECUTE foo;

And now if I notice that a query performs worse than I think it should, I can make it do what I want it to.

1 more reply

hibikir10y ago

The typical solution is to modify the autovacuum settings for that table to recalculate the statistics a lot more often, and maybe even with much higher resolution, depending on your case.

btilly10y ago

The first thing that I tried was a full VACUUM ANALYZE and then re-ran the same query. It didn't help. Therefore modifying autovacuum won't help.

Adjusting internal costs is promising, but I'd like to avoid going there exactly because of the possible negative side effects that you mention.

1 more reply

keslerm10y ago

You can push it in favor of certain options by disabling the one you don't want, such as sequential scan.

SET enable_seqscan = OFF;

We use these options a lot on tables that result in odd query plans to get them doing the best option..

btilly10y ago

1 more reply

reactor10y ago

Congrats to everyone involved, indeed an amazing opensource project that displays true integrity and discipline.

petergeoghegan10y ago

Thanks

wingsonfire10y ago

noselasd10y ago

Wouldn't you be able to do this in Postgres now ?

"RLS implements true per-row and per-column data access control"

uberneo10y ago

Any practical use cases of Row level security ?

Sanddancer10y ago

1 more reply

bryanlarsen10y ago

https://github.com/begriffs/postgrest

1 more reply

brlewis10y ago

What can you do with row-level security that you can't do by setting permissions on an updatable view?

jeffdavis10y ago

Normal views aren't designed for security. There are a number of ways that they can "leak" information that is supposed to be hidden.

RLS builds on top of something called a "security barrier view" which prevents certain kinds of optimizations that could cause this problem.

It also offers a nicer interface that's easier to manage.

1 more reply

elchief10y ago

For one, you don't need a view. And you'd probably need (in 9.4) insert, update, and delete triggers for anything beyond trivial row security.

ddlatham10y ago

HBase has also had cell level security for a couple of years now.

https://blogs.apache.org/hbase/entry/hbase_cell_security

brunoqc10y ago

Why would you have a SSN column if no one has access to it?

wingsonfire10y ago

I meant no human user has access to it. But application user that connects to credit score engine, map reduce job, or something similar has access.

I know applications can be compromised but now you can freely share your DB freely with other teams to analyze or play around

ankimal10y ago

http://www.postgresql.org/docs/9.5/static/brin-intro.html If like me you were looking for what BRIN index is all about.

avita110y ago

Out of curiosity, has anyone managed to find something more detailed about the guts?

It sounds like it's basically a BTree that stops branching at a certain threshold, but I'm almost certainly wrong.

anarazel10y ago

jakobegger10y ago

There was a talk about index internals at pgconf.eu that also covered the new BRIN indexes.

Slides are here: http://hlinnaka.iki.fi/presentations/Index-internals-Vienna2...

sandGorgon10y ago

anybody know how quickly does RDS upgrade to newer versions of Postgres.

I'm really, really keen to use 9.5 jsonb with its insert/update changes.

andor43610y ago

While you wait, check out http://stackoverflow.com/a/23500670/229006

My plan is to rely on these functions for now, and switch to the native implementations once 9.5 is production ready on RDS.

sandGorgon10y ago

this is so cool... thanks!!

fuhrysteve10y ago

They have a policy not to say anything or make any promises. That said, using history as a guide, it seems to take them about 2.5 months after a major release to add support.

dandigangi10y ago

I swear... I will redesign and develop Postgre's site for free.

spacemanmatt10y ago

I'm curious, what do you think needs to be redesigned?

dandigangi10y ago

I wish I could say it's circa Web 2.0 but it's still stuck even farther back than that. I mean it works which is great but I loathe spending time on it because it's such a poor experience.

jimktrains210y ago

Send a message to pgsql-www

systems10y ago

how does postgresql upsert compare to ms sql's merge statement

i want to look deeper into this, but didnt have the time but from the little i read, seems ms sql merge is more powerful

jeltz10y ago

dsp123410y ago

I am not sure of the current status, but when MERGE was new in MS SQL it was unusable for UPSERT

1 more reply

manigandham10y ago

Lots of databases have MERGE but it's different from the typical UPDATE OR INSERT logic in terms of use cases, table requirements and concurrency control.

Here's a great post from Postgres team showing why they didn't just implement merge themselves:

http://www.postgresql.org/message-id/CAM3SWZRP0c3g6+aJ=YYDGY...

cjauvin10y ago

I wonder how much time it will take to appear in the Ubuntu apt repo? Should it be already there (I don't see it yet)?

Edit: I meant apt.postgresql.org of course, not the official Ubuntu repo..

jeltz10y ago

No idea, but the PostgreSQL community distributes official Debian and Ubuntu packages at apt.postgresql.org. They should already have 9.5, or if not have it very soon.

chbrown10y ago

https://gist.github.com/chbrown/647a54dc3e1c2e8c7395

jhealy10y ago

pglogical (http://2ndquadrant.com/en-us/resources/pglogical/) claims to allow cross version upgrades from 9.4 to 9.5 with minimal downtime, but the documentation seems fairly light-on.

Has anyone come across a guide to using it for upgrades?

desmondrd10y ago

Upsert is something I expect so commonly in modern databases nowadays. Happy to see it here with Postgres.

tmaly10y ago

this is excellent news. I really enjoy using postgresql in one of my current projects. I look forward to using upsert and the new indexes

omarforgotpwd10y ago

What an absolutely fantastic project. The recent releases have all been very exciting.

gionn10y ago

Bye bye mongodb.

gcb010y ago

search in 5% the time it would take to search a btree? anyone can see that with actual data?

gcb010y ago

http://pythonsweetness.tumblr.com/post/119568339102/block-ra...

the very first example points to BRIN indexes resulting in smaller index than btree but with much longer search time... so i guess the 5% time figure was very use-case specific?

mmaunder10y ago

We do over 20,000 queries per second on one of our production mysql DB's and I'm not sure I'd trust anything else with that: http://i.imgur.com/sLZzXhS.png

Just curious if I'm missing out on some new awesomeness that PostgreSQL has or if it's just marketing.

sigil10y ago

> Is there really something about PSQL that makes it better than MySQL these days?

In a word: correctness.

MySQL is what happens when you build a database out of pure WAT [2].

[1] https://wiki.postgresql.org/wiki/UPSERT#MySQL.27s_INSERT_......

[2] https://www.destroyallsoftware.com/talks/wat

mmaunder10y ago

We use redis, memcached and other storage engines - by no means are we tied to mysql. But for what it does, it does it incredibly well.

I'm also completely open to using PostgreSQL and I was hoping someone could give me a compelling reason to switch to it or to use it.

2 more replies

colanderman10y ago

If all you care about is QPS, by all means, stick with MySQL.

Postgres's query optimizer is far more advanced too. MySQL doesn't even optimize across views, which discourages good coding practices.

The documentation is fantastic. Complete and well-written, covers the nuances of every command, expression, and type. MySQL's doesn't hold a candle to it.

lobster_johnson10y ago

It's a long time since MySQL was faster than Postgres.

Back in the early 2000s, LAMP people on Slashdot were benchmarking MyISAM tables to Postgres' 6.5/7.x's fully transactional engine. Unfortunately, the reputation as being slow stuck among developers.

Postgres particularly shines on multicore systems, thanks to some clever internal design choices. Having a sophisticated cost-based query planner also helps.

As for unfriendly: Care to amplify? In my work, I've found the opposite to be true.

Plus: Just type \h in psql. It has a complete reference of the entire SQL syntax.

rodgerd10y ago

> LAMP people on Slashdot were benchmarking MyISAM tables to Postgres' 6.5/7.x's fully transactional engine.

gnud10y ago

Off the top of my head:

  - CTE's
  - Arrays/JSON type
  - partial indexes
  - transactional DDL
  - NOTIFY
  - Materialized views
  - Schemas
  - PostGIS
  - Row level security (which is new in PG)

Veratyr10y ago

I'm no expert but in my (limited) experience, there are some super handy datatypes and features that PostgreSQL supports that MySQL doesn't:

- JSON. Postgres can store data as JSON and index and query the JSON. This essentially gives you MongoDB type queries.

I'm sure there's more but these are my favourite Postgres features.

davidw10y ago

Mysql always seemed to be fast like a bike going downhill with no brakes.

That said, over the years Mysql has been improving too, for sure.

saurik10y ago

MySQL still does not have transactional DDL.

1 more reply

__jal10y ago

Some of it is historic - Mysql has gotten much better in recent years at supporting the parts of being an RDBMS that matters the most when money is riding on it.

And as the PG devs have said for years, they don't compete with Mysql. They compete with Oracle. There's no reason to switch if you're happy with Mysql.

rimantas10y ago

What's wrong with April 30th? And operator understanding their tools applies to MySQL as well. You can set it be more strict, but default is lax. Not sure if reverse is possible with PG.

1 more reply

j / k navigate · click thread line to collapse